GraphQL vs REST at Scale
REST APIs offer predictable performance and native HTTP caching but suffer from overfetching and underfetching.
- REST APIs offer predictable performance and native HTTP caching but suffer from overfetching and underfetching.
- GraphQL eliminates overfetching by allowing clients to request exact fields, but introduces the catastrophic N+1 database query problem.
- GraphQL queries are typically uncacheable at the HTTP edge layer because they use POST requests with dynamic payloads.
- Scale GraphQL safely by implementing strict query depth limiting, cost analysis, and DataLoader batching.
The Problem
As client applications grow complex, REST APIs force developers into a difficult trade-off. They must either build highly specific endpoints to avoid overfetching data (wasting bandwidth), or build generic endpoints that require clients to make multiple sequential roundtrips (underfetching, leading to slow page loads). However, when teams migrate to GraphQL to solve this, they often crash their databases. Naive GraphQL resolvers execute a separate database query for every nested field in a query (the N+1 problem), allowing a single malicious or poorly written client query to bring down the entire database.
Core System Idea
The choice between GraphQL and REST at scale is a trade-off between client flexibility and server-side predictability. REST exposes structured, resource-oriented endpoints (e.g., /users/:id) that map cleanly to database entities and leverage standard HTTP caching (via ETag or Cache-Control headers) at the CDN level.
GraphQL exposes a single endpoint (usually /graphql) and uses a schema definition language to let clients query arbitrary graphs of data. To make GraphQL safe at scale, the server must implement a batching and caching layer (like the DataLoader pattern) to coalesce individual database lookups into single batched queries. Additionally, the gateway must parse and analyze incoming query documents, rejecting queries that exceed safe depth or complexity thresholds before they are executed.
System Flow
The GraphQL Gateway analyzes query complexity and depth before execution, using a DataLoader layer to batch nested database requests and prevent N+1 query issues.
Real-World Examples Indicative
GitHub uses a point-based cost system where every requested node costs 1 point and every connection costs first or last argument points. The limit is 5,000 points per hour. A query like repositories(first:100) { issues(first:100) { nodes { id } } } costs 10,100 points and is rejected before execution. GitHub returns X-RateLimit-NodeCount and X-RateLimit-Cost headers so clients can tune queries before hitting the limit.
All production Shopify Storefront API calls use persisted queries—the client sends a SHA-256 hash of a query document registered at build time via POST /graphql/persist. At runtime, Shopify's Fastly CDN caches GET requests by hash as the cache key, achieving ~95% cache hit ratio on standard product page queries. An unregistered query hash returns {"errors": [{"message": "PersistedQueryNotFound"}]}, blocking ad-hoc queries from reaching origin.
Netflix's API gateway runs Apollo Router federating 80+ domain microservices (titles, recommendations, viewing history, billing) into a single unified graph. Each domain service owns its schema fragment and exposes it via @key directive so the router can stitch entity types across service boundaries. DataLoader batches all lookupTitle(ids: [...]) calls for a single request into one upstream call per service, reducing N+1 resolution overhead from hundreds of DB queries to one batched query per domain.
Anti-Patterns
Mapping GraphQL types directly to database tables, which leaks internal implementation details and prevents database refactoring.
Allowing clients to execute infinitely nested queries (e.g., user { friends { friends { friends } } }), which quickly exhausts server memory and crashes the process.
Writing nested GraphQL resolvers that perform individual database queries per item in a list, resulting in hundreds of database roundtrips for a single HTTP request.
Sending large file uploads or binary data through GraphQL mutations, which inflates payloads due to Base64 encoding (use direct S3 pre-signed URLs instead).
Design Tradeoffs
| Dimension | GraphQL | REST |
|---|---|---|
| Fetching efficiency | Clients specify exact fields; eliminates over-fetching and multi-roundtrip under-fetching in a single request | Fixed payloads force over-fetching or multiple sequential roundtrips to assemble composite views |
| HTTP caching | POST-based queries bypass CDN cache by default; persisted queries with GET enable edge caching but require build-time registration | Native HTTP caching via Cache-Control and ETag; CDN caches full responses without any extra infrastructure |
| Server complexity | Query parsing, cost analysis, DataLoader batching, and schema federation required at the gateway layer | Predictable execution per endpoint; standard routing with no query planning or batching overhead |
Best Practices
/v1 vs /v2). Instead, evolve GraphQL schemas by deprecating fields gradually and adding new fields incrementally.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| You have highly diverse clients (web, iOS, Android, IoT) requiring different data shapes from the same backend. | You are building simple CRUD APIs with predictable, uniform data access patterns. |
| You are building a developer platform or public API where clients need to self-serve complex data relationships. | Your system has ultra-low latency requirements and cannot tolerate the overhead of query parsing and validation. |
| You have a federated microservices architecture where a single gateway needs to stitch multiple domain graphs together. | Your team lacks the operational capacity to monitor resolver performance, implement batching, and secure GraphQL endpoints. |