← Python Code Memory & Data Structures
Browse Python Concepts

Memory Usage of Python Data Structures

Mental Model

Imagine sys.getsizeof() as weighing a box. It tells you the weight of the box itself, plus the weight of the direct pointers to its contents. It does not open the box, nor any nested boxes, to weigh their actual contents.

Rule: When calculating memory usage, never trust shallow sys.getsizeof output on composite data structures; always use profiling tools like memory-profiler or custom deep-sizing functions.

The Setup

You are profiling a microservice that caches database keys. You need to verify that your memory budget remains under 512MB, so you use sys.getsizeof to monitor cache growth during operation.

What Does This Print?

Broken code
Python
import sys

class CachedItem:
    def __init__(self, item_id, payload):
        self.item_id = item_id
        self.payload = payload

# Generate cache list
cache = [CachedItem(i, "X" * 100000) for i in range(100)]

# Check memory consumption
measured_size = sys.getsizeof(cache)
print(f"Reported list container memory size: {measured_size} bytes")
Predict if sys.getsizeof accurately captures the megabytes of string data packed inside the cached list items.

The Output

What actually happens
Reported list container memory size: 904 bytes

The report claims the list is only 904 bytes, whereas the actual raw payload alone spans over 10 megabytes. sys.getsizeof failed to count the memory footprint of the nested objects, making it dangerous for resource budgeting.

Why Python Does This

In CPython, arrays and collections store memory pointers to objects rather than storing the actual values inline. sys.getsizeof measures only the space of the container itself—meaning the array of pointer references—plus the basic object header of the list (PyVarObject). It does not traverse the pointer addresses to aggregate the memory consumption of the referenced child objects.

The Fix

Corrected pattern
Python
import sys

class CachedItem:
    def __init__(self, item_id, payload):
        self.item_id = item_id
        self.payload = payload

# Recursively count memory of all referenced objects
def total_sizeof(obj, seen=None):
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    seen.add(obj_id)
    size = sys.getsizeof(obj)
    if isinstance(obj, dict):
        size += sum(total_sizeof(k, seen) + total_sizeof(v, seen) for k, v in obj.items())
    elif hasattr(obj, '__dict__'):
        size += total_sizeof(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum(total_sizeof(item, seen) for item in obj)
    return size

cache = [CachedItem(i, "X" * 100000) for i in range(100)]
print(f"True cumulative memory size: {total_sizeof(cache)} bytes")

A "deep" memory measurement involves recursively traversing the object and its contents, summing the sys.getsizeof() of each individual object it references. Profiling tools automate this traversal, providing a true accounting of total memory consumed by an object graph.

How This Fails in Real Systems

A devops monitoring system calculated target service memory footprints using sys.getsizeof(service_cache). Because it reported low usage, the microservice was over-allocated inside container clusters, eventually triggering hard OS-level OOM kills because its true memory footprint was 20x higher than estimated.

Key Takeaway

When calculating memory usage, never trust shallow sys.getsizeof output on composite data structures; always use profiling tools like memory-profiler or custom deep-sizing functions.
Common mistake: Developers rely solely on sys.getsizeof() to determine the total memory footprint of complex Python objects or containers, misinterpreting its output as the deep size.