Compute-Bound vs Memory-Bound Workloads
Compute-Bound: Saturated Tensor Cores (e.g., LLM Prompt Prefill, Training). Memory-Bound: Saturated HBM bus (e.g., LLM Token Decoding, Element-wise ops).
Source: mortalapps.com- Compute-Bound: Saturated Tensor Cores (e.g., LLM Prompt Prefill, Training).
- Memory-Bound: Saturated HBM bus (e.g., LLM Token Decoding, Element-wise ops).
- LLM inference shifts dynamically between the two phases.
- Blackwell integrates a Hardware Decompression Engine (DE) to artificially expand memory bandwidth.
Why This Matters
You cannot design a serving cluster without knowing the bounds. If your workload is memory-bound, buying a GPU with faster Tensor Cores yields zero speedup. Infrastructure engineers scale batch sizes specifically to transition workloads from memory-bound to compute-bound to justify the hardware cost.
Core Intuition
Compute-Bound is when the chefs in the kitchen are working as fast as they can, but there are too many orders (Math). Memory-Bound is when the chefs are standing around idle because the delivery truck hasn't brought the ingredients yet (Bandwidth).
Technical Deep Dive
The dichotomy is most visible in Large Language Model (LLM) serving:
Prefill Phase: The model ingests a prompt of 2000 tokens. It performs a massive Matrix-Matrix Multiplication (GEMM). The weight matrix is loaded once from HBM and reused 2000 times against the tokens. Massive arithmetic intensity. (Compute-Bound).
Decode Phase: The model generates one token. It performs a Matrix-Vector Multiplication (GEMV). The entire 70B weight matrix must be streamed from HBM into the SMs just to multiply it against a single token vector. Zero data reuse. (Memory-Bound).
To combat memory bottlenecks, the Blackwell B200 features a dedicated hardware Decompression Engine (DE). This allows the GPU to store highly compressed weights in HBM. The memory controller fetches a smaller byte footprint, and the DE dynamically decompresses the weights on the fly within the instruction pipeline, effectively multiplying the physical 8 TB/s HBM bandwidth.