Everything Is an Object — id(), Reference Counting, GC
Think of Python objects as living independently in memory, not strictly tied to the variables that point to them. Variables are just labels; an object persists as long as any label (reference) points to it, even if those labels form a cycle.
The Setup
You are optimizing a long-running batch processing engine that loads configuration elements and constructs large graph dependencies. Memory usage keeps climbing, and you notice objects are persisting even after the parent engine scope is closed.
What Does This Print?
import sys
import ctypes
class Node:
def __init__(self, value):
self.value = value
self.edges = []
def build_cycle():
n1 = Node(1)
n2 = Node(2)
n1.edges.append(n2)
n2.edges.append(n1)
return id(n1), id(n2)
id1, id2 = build_cycle()
# We check if memory addresses are still holding live data
print(f"Address 1 holding data: {ctypes.cast(id1, ctypes.py_object).value.value}")
The Output
It prints the node values successfully:
Even though the function scope has exited and the variables n1 and n2 are no longer accessible directly, the raw memory addresses still contain the valid Node objects. This is because reference counting alone failed to clean them up due to the circular reference. The garbage collector has not run yet, meaning the memory is leaked until the next collection cycle.
Why Python Does This
Under CPython, every object is represented by a PyObject structure containing ob_refcnt and ob_type. When variables go out of scope, Python decrements ob_refcnt. If it hits zero, Python frees the memory. However, circular references prevent the reference count from hitting zero (each node has an edge pointing to the other). Python relies on a generational cyclic garbage collector to detect and break these reference cycles. The GC runs periodically, not instantly, leaving these objects alive in memory until a collection occurs.
The Fix
import weakref
class Node:
def __init__(self, value):
self.value = value
self.edges = []
def build_weak_cycle():
n1 = Node(1)
n2 = Node(2)
n1.edges.append(weakref.ref(n2)) # weak ref — does not increment refcount
n2.edges.append(weakref.ref(n1))
# Return weak refs to observe collection from outside the function
return weakref.ref(n1), weakref.ref(n2)
ref1, ref2 = build_weak_cycle()
# n1 and n2 go out of scope; only weak refs remain, so refcounts hit 0 immediately
print(f"Node 1 still alive: {ref1() is not None}") # False — collected at once
print(f"Node 2 still alive: {ref2() is not None}") # False — collected at once
Weak references do not increment ob_refcnt. When n1 and n2 go out of scope, nothing increments their reference counts, so both hit zero immediately and are freed by normal reference counting — no cyclic GC pass required. The weakref.ref objects themselves return None once their target is collected, signalling that the memory has been reclaimed.
How This Fails in Real Systems
A distributed microservice using an in-memory graph to cache route topologies ran out of memory every 4 hours. Developers used hard-coded back-links in Node structures. The GC couldn't keep up with high-throughput updates, leading to a slow memory leak that triggered container OOM kills during peak hours.