← System Design Backend Architectures
System Design

CDN Architecture

CDNs reduce latency by caching static and dynamic content at globally distributed edge servers close to users.

TL;DR
  • CDNs reduce latency by caching static and dynamic content at globally distributed edge servers close to users.
  • Cache invalidation is a major operational challenge; use surrogate keys (Cache-Tags) for precise, real-time purging.
  • An Origin Shield acts as a centralized caching layer to protect backend databases from "thundering herd" traffic spikes.
  • Monitor Cache-Hit Ratio (CHR) as a primary metric; a low CHR indicates misconfigured TTLs or cache-busting queries.

The Problem

When a global user base accesses an application hosted in a single cloud region (e.g., US-East), they experience high latency due to the physical speed of light over fiber-optic cables. Static assets (images, JS, CSS) take hundreds of milliseconds to load, degrading the user experience. Furthermore, during high-traffic events (like a product launch or breaking news), millions of concurrent requests for the same assets hit the origin servers simultaneously. This "thundering herd" problem quickly exhausts backend CPU, memory, and database connection pools, leading to complete system collapse.

Core System Idea

A Content Delivery Network (CDN) architecture solves this by placing a globally distributed network of proxy servers (Edge POPs) between the users and the origin server.

When a user requests an asset, the request is routed to the geographically nearest edge server via Anycast DNS or latency-based routing. If the edge server has the asset cached (a cache hit), it returns it instantly, bypassing the origin entirely. If it is a cache miss, the edge server fetches the asset from the origin, caches it locally for future requests, and returns it to the user.

To protect the origin from concurrent cache misses on different edge servers, an Origin Shield (an intermediate caching layer) is placed between the edge servers and the origin, consolidating duplicate requests into a single upstream call.

System Flow

flowchart TD Client[Global Client] -- "1. Request" --> Edge[CDN Edge Server] Edge -- "2a. Cache Hit: Return" --> Client Edge -- "2b. Cache Miss: Forward" --> Shield[Origin Shield] Shield -- "3a. Shield Hit: Return" --> Edge Shield -- "3b. Shield Miss: Forward" --> Origin[Origin Server] Origin -- "4. Return and Cache" --> Shield

The CDN routes client requests to the nearest Edge Server, utilizing an Origin Shield to consolidate cache misses and protect the Origin Server from traffic spikes.

Real-World Examples Indicative

Cloudflare Cache-Tag Purging

Cloudflare allows up to 30 Cache-Tag values per response via an HTTP header. When a product record updates, a single POST /zones/{zone_id}/purge_cache API call with {"tags": ["product-123"]} propagates the purge to all 300+ PoPs globally within ~150ms. GitHub uses this mechanism for documentation pages—when a PR merges, a webhook triggers a tag purge invalidating all cached pages under the affected repository path, ensuring users never see stale rendered content.

Fastly at GitHub Pages

GitHub Pages serves static sites through Fastly. The GitHub Pages origin sets Surrogate-Key: repo-{id} user-{id} headers on every response. When a git push event occurs, GitHub's internal webhook service calls Fastly's instant purge API with the repo surrogate key, invalidating only that repository's pages globally in under 150ms. VCL (Varnish Configuration Language) enforces cache segmentation: authenticated GitHub.com requests carrying session cookies bypass the cache via Vary: Cookie, while anonymous requests are cached with stale-while-revalidate: 86400.

Netflix Open Connect Origin Shield

Netflix delivers 15+ Petabytes of content per day through its Open Connect CDN. Video files are stored on Amazon S3 in us-east-1. Instead of each of 1,000+ global PoPs hitting S3 directly on a cache miss, Netflix routes all PoP misses through a single Origin Shield cluster in us-east-1. This collapses N concurrent S3 requests into 1 per unique asset, reducing S3 GET request volume by ~70% and saving millions of dollars in egress costs annually. For major title releases, Netflix pre-positions content by proactively pushing files to PoPs before user traffic arrives, eliminating the cold-start thundering herd.

Anti-Patterns

Caching Personalized Data

Misconfiguring cache headers such that private user data (e.g., /dashboard or user-specific JSON responses) is cached at the edge and served to other users.

Infinite TTLs without Invalidation

Setting extremely long TTL values on assets without implementing a programmatic cache invalidation strategy, leaving users stuck with stale content after deployments.

Query-String Cache Busting

Allowing arbitrary query parameters (like ?timestamp=12345) to bypass the cache, which forces the CDN to treat every request as a cache miss, destroying the Cache-Hit Ratio.

No Origin Shield during Spikes

Failing to configure an Origin Shield during high-traffic events, allowing hundreds of edge servers to query the origin simultaneously for the same expired asset.

Design Tradeoffs

DimensionEdge CDN CachingDirect Origin Delivery
LatencySub-5ms response from the nearest PoP for cached assets; reduces cross-continental round-trips from 150ms+ to under 10msFull RTT to the origin datacenter on every request; 150-300ms for users in distant regions
Freshness controlStale content risk if TTLs are misconfigured or purge fails; requires Cache-Tag pipelines and surrogate key disciplineAlways fresh; the origin response is authoritative with no intermediate caching layer between user and data
Origin cost70-95% reduction in origin request volume (Netflix ~70% via Origin Shield); major egress bandwidth savingsFull origin bandwidth and compute cost on every request; no caching benefit to amortize across users

Best Practices

Use Surrogate Keys (Cache-Tags)Group related cached assets using custom HTTP headers (e.g., Cache-Tag: product-123). This allows you to purge thousands of related pages instantly with a single API call when that entity updates.
Monitor Cache-Hit Ratio (CHR)Treat CHR as a core reliability metric. Aim for 80-95% for static assets; a sudden drop indicates misconfigured caching rules or cache-busting behavior from query strings.
Implement Stale-While-RevalidateUse Cache-Control: max-age=600, stale-while-revalidate=30 to allow the CDN to serve stale content instantly while asynchronously fetching a fresh copy from origin in the background.
Leverage Edge Compute for AI StreamingUse edge workers to handle Server-Sent Events (SSE) and stream AI model responses directly to users, reducing latency and offloading connection management from your backend.
Configure Failover to OriginSet up your CDN to automatically serve cached stale content or a custom static fallback page if your origin server returns a 5xx error.

When to Use / Avoid

Use WhenAvoid When
You have a globally distributed user base and serve static assets, media, or semi-static API responses.Your application is used strictly within a single, localized corporate network or intranet.
You experience highly unpredictable traffic spikes (e.g., media sites, e-commerce, public APIs).Your data is highly dynamic, personalized per user, and changes on every single request (e.g., real-time stock trading dashboards).
You want to reduce cloud egress costs by offloading bandwidth-heavy asset delivery to a CDN.You do not have the operational capacity to manage cache invalidation pipelines and debug caching issues.