← System Design Data & Messaging Systems
System Design

Blob and Object Storage Architecture

Decouple application servers from heavy file transfers by utilizing presigned URLs for direct client uploads.

TL;DR
  • Decouple application servers from heavy file transfers by utilizing presigned URLs for direct client uploads.
  • Achieve high-throughput parallel transfers for large files using multipart upload protocols.
  • Optimize storage costs automatically by configuring lifecycle rules to transition data to colder tiers.
  • Mitigate read-after-write consistency issues by designing applications to handle eventual consistency in secondary indexes.

The Problem

Storing large binary files (images, videos, backups) directly inside relational databases degrades transaction performance, bloats backups, and exhausts expensive SSD storage.

Conversely, storing files on local application server disks prevents horizontal scaling, as files are trapped on a single machine.

Furthermore, proxying large file uploads through application servers chokes network bandwidth, increases memory usage, and blocks execution threads, leading to slow response times and frequent timeouts.

Core System Idea

Object storage architectures decouple metadata (stored in high-performance databases) from the physical binary data (stored in a distributed flat namespace). Files are treated as immutable "objects" identified by unique keys within a "bucket."

To scale uploads and downloads without overloading application servers, the system uses "Presigned URLs." The application server authenticates the user, validates the request, and generates a time-limited, cryptographically signed URL. The client then uploads or downloads the binary file directly to/from the object storage service using this URL.

For large files, the architecture utilizes "Multipart Uploads." The file is split into independent chunks, uploaded in parallel, and reassembled by the object storage engine.

To manage costs, the system relies on "Storage Class Tiering," automatically moving objects from high-performance SSDs (Hot) to cheaper HDDs (Warm/Cool) and tape-like archival storage (Cold/Glacier) based on access frequency and age.

System Flow

flowchart TD A["Client"] -->|"1. Request Upload Permission"| B["Application Server"] B -->|"2. Generate and Sign URL"| C{"Object Storage API"} B -->|"3. Return Presigned URL"| A A -->|"4. PUT Binary File Direct"| C C -->|"5. Trigger Event Notification"| D["Message Queue"] D -->|"6. Process Metadata"| B

Direct-to-client upload flow using presigned URLs to bypass application server network bottlenecks.

Real-World Examples Indicative

Dropbox Magic Pocket — 30% deduplication on 500B+ files

Dropbox built Magic Pocket in 2016 to replace S3, saving ~$75M/year in storage costs. Magic Pocket splits files into 4MB content-addressed blocks (SHA-256 hash as key). Identical content blocks — common across Dropbox's 500B+ file corpus (duplicate documents, shared spreadsheets, identical profile photos) — are deduplicated at the block level: the same physical block is stored once regardless of how many users have that file. The deduplication rate across blocks is ~30%, meaning only ~70% of blocks require unique physical storage. Client uploads use presigned internal tokens: the API server validates metadata and returns an upload token; the client streams blocks directly to Magic Pocket storage nodes without touching application servers.

Cloudflare R2 — presigned URLs with configurable TTLs for zero-egress storage

Cloudflare R2 stores 10B+ objects for customers migrating from S3 to avoid $0.09/GB egress charges. R2 supports multipart uploads with a 5MB minimum part size and up to 10,000 parts per object. R2's event notifications publish to Cloudflare Workers within 250ms of object creation — workers trigger transcoding pipelines via Cloudflare Queues. Presigned URL TTL is configurable from 1 second to 7 days: Figma uses 60-second TTLs for sensitive exported design assets, while public CDN-served assets use 24-hour TTLs to maximize edge cache hit rates.

Figma S3 multipart — 2GB exports from 4 min to 50 sec

Figma stores design files as delta-compressed S3 objects. Large enterprise exports (up to 2GB) use S3 multipart upload: the backend issues CreateMultipartUpload, splits the file into 50MB chunks, and uploads each chunk across 5 concurrent HTTP connections in parallel. This reduces a 2GB sequential upload from ~4 minutes to ~50 seconds. A bucket lifecycle rule automatically aborts incomplete multipart uploads after 7 days, preventing orphaned fragment storage charges from clients that disconnect mid-upload.

Anti-Patterns

Proxying File Uploads Through App Servers

Reading file streams into application memory before writing them to object storage wastes CPU, RAM, and network bandwidth.

Using Object Storage as a Transactional File System

Attempting to perform frequent append operations or file locks on object storage is highly inefficient, as objects are immutable and must be completely rewritten on every update.

Exposing Public Buckets

Failing to restrict bucket permissions or relying on security-by-obscurity for object URLs leads to catastrophic data leaks.

Ignoring Failed Multipart Uploads

Failing to configure lifecycle rules to clean up incomplete multipart uploads results in hidden storage charges for orphaned file fragments.

Design Tradeoffs

DimensionDirect Upload (Presigned URLs)Server-Proxied Upload
Application server loadZero load; clients stream binary data directly to object storage, bypassing application server memory and bandwidth entirelyHigh load; servers must buffer the full file stream in memory and forward it, consuming RAM and network I/O per upload
Pre-storage validationHard; virus scanning, image resizing, or format validation must happen asynchronously after upload via event triggersEasy; application servers can inspect, transform, or reject files synchronously before writing them to storage
Client implementation complexityRequires CORS configuration, presigned URL generation logic, and client-side multipart retry handlingSimple; standard multipart form submission with no client-side URL management or CORS concerns

Best Practices

Enforce Direct Uploads via Presigned URLsNever let binary data touch your application servers; delegate all upload and download traffic to the object storage provider.
Configure Incomplete Multipart Upload CleanupImplement a bucket lifecycle rule to automatically delete incomplete multipart uploads after 7 days to prevent runaway storage costs.
Leverage CDN IntegrationPlace a Content Delivery Network (CDN) in front of your object storage bucket for read-heavy, globally distributed assets to reduce latency and egress costs.
Use Event-Driven ProcessingUse bucket event notifications (e.g., S3 Event Notifications) to trigger asynchronous background jobs (like image resizing or transcoding) via message queues.
Implement Object Versioning CarefullyEnable versioning to protect against accidental deletions, but pair it with strict lifecycle rules to purge old versions and control costs.

When to Use / Avoid

Use WhenAvoid When
Storing immutable files larger than a few kilobytes (e.g., media assets, documents, database backups).Storing highly dynamic data that requires frequent in-place updates or appends.
Building globally accessible, highly durable data lakes for analytical processing.Low-latency file system operations (e.g., running database storage engines or code execution).
Distributing static web assets (HTML, JS, CSS, images) at scale via CDNs.Storing highly sensitive, transient data that must never leave local memory.