Memory Usage of Python Data Structures
Imagine sys.getsizeof() as weighing a box. It tells you the weight of the box itself, plus the weight of the direct pointers to its contents. It does not open the box, nor any nested boxes, to weigh their actual contents.
sys.getsizeof output on composite data structures; always use profiling tools like memory-profiler or custom deep-sizing functions.The Setup
You are profiling a microservice that caches database keys. You need to verify that your memory budget remains under 512MB, so you use sys.getsizeof to monitor cache growth during operation.
What Does This Print?
import sys
class CachedItem:
def __init__(self, item_id, payload):
self.item_id = item_id
self.payload = payload
# Generate cache list
cache = [CachedItem(i, "X" * 100000) for i in range(100)]
# Check memory consumption
measured_size = sys.getsizeof(cache)
print(f"Reported list container memory size: {measured_size} bytes")
sys.getsizeof accurately captures the megabytes of string data packed inside the cached list items.
The Output
The report claims the list is only 904 bytes, whereas the actual raw payload alone spans over 10 megabytes. sys.getsizeof failed to count the memory footprint of the nested objects, making it dangerous for resource budgeting.
Why Python Does This
In CPython, arrays and collections store memory pointers to objects rather than storing the actual values inline. sys.getsizeof measures only the space of the container itself—meaning the array of pointer references—plus the basic object header of the list (PyVarObject). It does not traverse the pointer addresses to aggregate the memory consumption of the referenced child objects.
The Fix
import sys
class CachedItem:
def __init__(self, item_id, payload):
self.item_id = item_id
self.payload = payload
# Recursively count memory of all referenced objects
def total_sizeof(obj, seen=None):
if seen is None:
seen = set()
obj_id = id(obj)
if obj_id in seen:
return 0
seen.add(obj_id)
size = sys.getsizeof(obj)
if isinstance(obj, dict):
size += sum(total_sizeof(k, seen) + total_sizeof(v, seen) for k, v in obj.items())
elif hasattr(obj, '__dict__'):
size += total_sizeof(obj.__dict__, seen)
elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
size += sum(total_sizeof(item, seen) for item in obj)
return size
cache = [CachedItem(i, "X" * 100000) for i in range(100)]
print(f"True cumulative memory size: {total_sizeof(cache)} bytes")
A "deep" memory measurement involves recursively traversing the object and its contents, summing the sys.getsizeof() of each individual object it references. Profiling tools automate this traversal, providing a true accounting of total memory consumed by an object graph.
How This Fails in Real Systems
A devops monitoring system calculated target service memory footprints using sys.getsizeof(service_cache). Because it reported low usage, the microservice was over-allocated inside container clusters, eventually triggering hard OS-level OOM kills because its true memory footprint was 20x higher than estimated.
Key Takeaway
sys.getsizeof output on composite data structures; always use profiling tools like memory-profiler or custom deep-sizing functions.sys.getsizeof() to determine the total memory footprint of complex Python objects or containers, misinterpreting its output as the deep size.