← Python Code Deployment & Monitoring
Browse Python Concepts

Graceful Shutdown — Handling SIGTERM in Python Services

Mental Model

Imagine your Python program as a person in a house. KeyboardInterrupt (Ctrl+C) is like tapping them on the shoulder, asking them to leave. SIGTERM (from an orchestrator) is like a formal eviction notice. If the person only responds to shoulder taps, they'll miss the eviction notice and be forcibly removed, leaving things messy.

Rule: When building containerized daemons, always register custom signal handlers to capture SIGTERM and execute graceful cleanups.

The Setup

You write a daemon to process background jobs. You handle KeyboardInterrupt so you can shut down cleanly on your machine, but when deploying to production, Kubernetes terminates your process abruptly, leaving jobs half-finished.

What Does This Print?

Broken code
Python
import sys
import time

def cleanup():
    print("Releasing locks, flushing buffers, exiting cleanly!")

print("Daemon running...")
try:
    # Simulate a processing loop
    for tick in range(1, 3):
        print(f"Saving step {tick}")
        time.sleep(1)
except KeyboardInterrupt:
    cleanup()
    sys.exit(0)
# Note: If OS sends SIGTERM instead of SIGINT (Ctrl+C), does this block run?
Predict what happens if this script receives a SIGTERM signal from Docker or Kubernetes during runtime.

The Output

What actually happens
Daemon running... Saving step 1 Saving step 2 (Process terminated abruptly without executing cleanups!)

When a container orchestrator halts a container, it emits a SIGTERM signal. Standard Python applications catching KeyboardInterrupt will completely miss this signal because CPython does not convert SIGTERM into Python exceptions natively. As a result, the application exits abruptly without running finally blocks, causing lost logs or database write corruption.

Why Python Does This

At the OS level, different signals trigger different handler paths. Python automatically hooks SIGINT (signal 2, triggered by Ctrl+C) and maps it to a KeyboardInterrupt exception, allowing typical try-except handlers to execute. However, SIGTERM (signal 15) is handled by the default OS C-handler, which abruptly halts process memory and exits immediately with code 143. To run cleanups or finalize operations, you must register a custom signal handler using the standard library signal module that explicitly translates SIGTERM into a clean exit sequence or raises a SystemExit exception.

The Fix

Corrected pattern
Python
import sys
import time
import signal

def cleanup(signum, frame):
    print("SIGTERM received! Releasing locks, flushing buffers, exiting cleanly!")
    sys.exit(0) # Triggers standard exit, executing try-finally paths

# Fix: Explicitly register the custom cleanup handler for SIGTERM
signal.signal(signal.SIGTERM, cleanup)

print("Daemon running...")
try:
    for tick in range(1, 3):
        print(f"Saving step {tick}")
        time.sleep(1)
finally:
    # This block now runs on SIGTERM cleanups safely
    print("Finalizing thread context resources.")

Registering a signal handler for SIGTERM allows the Python application to explicitly intercept the shutdown signal sent by container orchestrators. This provides a designated entry point to execute cleanup routines (e.g., flushing data, releasing resources) before exiting, ensuring data integrity and a clean shutdown.

How This Fails in Real Systems

A high-volume data pipeline daemon suffered from database index corruption twice a week. Operators discovered that deployment updates on Kubernetes sent SIGTERM signals to pods, which immediately exited mid-transaction. Once a signal handler was registered to intercept SIGTERM, corruption incidents dropped to zero.

Key Takeaway

When building containerized daemons, always register custom signal handlers to capture SIGTERM and execute graceful cleanups.
Common mistake: Developers assume that catching KeyboardInterrupt is sufficient for graceful shutdown in containerized applications, failing to realize that orchestration systems send SIGTERM signals which are not handled by default, leading to abrupt process termination and data loss.