InfiniBand NDR Networks
InfiniBand NDR (Next Data Rate) powers 400 Gb/s per-port networks, doubling the bandwidth of the previous HDR generation and serving as the backbone of
Source: mortalapps.com- InfiniBand NDR (Next Data Rate) powers 400 Gb/s per-port networks, doubling the bandwidth of the previous HDR generation and serving as the backbone of modern supercomputing.
- The NVIDIA Quantum-2 switch system provides 64 non-blocking ports of 400 Gb/s, yielding a massive 51.2 Tb/s of aggregate throughput per 1U chassis.
- NDR heavily integrates advanced In-Network Computing, utilizing the SHARPv3 protocol to offload mathematical collective operations directly into the switch silicon.
- The succeeding XDR (Quantum-3) generation pushes this envelope even further to 800 Gb/s per port, integrating co-packaged silicon photonics to reduce latency.
Why This Matters
As AI models demand petabytes of data exchange during the training phase, the network fabric cannot simply act as a passive pipe; it must actively accelerate the workload. InfiniBand provides highly deterministic, ultra-low latency, absolutely lossless transmission, and hardware-accelerated adaptive routing. NDR 400G and XDR 800G networks form the backbones of the world's most powerful AI supercomputers, enabling the linear scaling of training efficiency across tens of thousands of GPUs.
Core Intuition
Standard Ethernet was originally designed for noisy, lossy, unpredictable internet traffic. InfiniBand, conversely, was engineered from the ground up specifically for tightly coupled supercomputing. It utilizes a strict credit-based flow control mechanism that absolutely guarantees lossless transmission—an InfiniBand switch will not transmit a single packet unless it mathematically knows the downstream receiver has available buffer space. Furthermore, the inclusion of In-Network Computing (SHARP) turns the network switch from a simple traffic intersection into a mathematical co-processor that computes data as it routes it.
Technical Deep Dive
The Quantum-2 NDR Architecture (QM9700/QM9790 switch systems) utilizes 32 OSFP (Octal Small Form-factor Pluggable) physical connectors. Because NDR transceivers are heavily engineered as twin-port devices, these 32 physical cages provide 64 distinct 400 Gb/s ports, creating a 51.2 Tb/s non-blocking switching capacity. A single Quantum-2 switch handles over 66.5 billion packets per second (BPPS).
The subsequent Quantum-3 XDR Architecture (Quantum-X800) scales the fabric to 144 ports of 800 Gb/s per switch, integrating advanced co-packaged silicon photonics to reduce both latency and power consumption by minimizing the physical distances electrical signals must travel before converting to light.
| InfiniBand Generation | Per-Port Bandwidth | Switch Throughput | Key In-Network Feature |
|---|---|---|---|
| HDR (Quantum) | 200 Gb/s | 16 Tb/s | SHARPv2 |
| NDR (Quantum-2) | 400 Gb/s | 51.2 Tb/s | SHARPv3 38 |
| XDR (Quantum-X800) | 800 Gb/s | 115.2 Tb/s | SHARPv4 37 |
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) allows the switch silicon to perform mathematical reductions (e.g., summing gradients). Multiple switches coordinate to aggregate data as it physically moves up the network tree, sending only a single reduced payload back down. This outright eliminates the massive network incast congestion typical of traditional AllReduce software implementations.