Why does traditional Ethernet struggle with AI workloads compared to a Scheduled Fabric?

Standard Ethernet operates on a principle of probabilistic fairness, using reactive congestion control like PFC (Priority Flow Control). When massive AI clusters synchronize for collective operations, they create "bursty" traffic that traditional switches can't predict, leading to buffer overflows and idle GPUs. WhiteFiber’s Scheduled Fabric architecture—developed with DriveNets—replaces this reactive guessing with a proactive system. By breaking data into fixed-size cells and scheduling them across the fabric, the network functions as a single, coordinated entity, eliminating the "stop-and-go" behavior that typically kills training efficiency.

What were the specific performance results of the WhiteFiber H200 cluster?

In real-world testing on a 512-GPU NVIDIA H200 cluster, the architecture achieved an average of 390 GB/s, which is 97.5% of the theoretical peak bandwidth. This was measured using standard NCCL benchmarks without the extensive, laboratory-style tuning typically required to reach these numbers. For engineering teams, this means the network provides a nearly transparent pipe for data, allowing the compute hardware to operate at its absolute limit rather than waiting on stalled packets.

How does "Model FLOPS Utilization" change the ROI of AI infrastructure?

Model FLOPS Utilization (MFU) is the ultimate metric for cluster efficiency, representing the percentage of a GPU's theoretical math capacity actually used for training. While many H100 or H200 clusters see utilization rates stall between 40% and 50% due to network bottlenecks, WhiteFiber's scheduled fabric achieved an MFU of 63% while running a Llama 3.1 8B model. This 13-23% improvement in efficiency directly translates to faster "time-to-train" and lower capital expenditure, as fewer GPUs are needed to achieve the same result.

When should an organization choose Scheduled Fabric Ethernet over InfiniBand?

The choice isn't about which is "better," but which fits the operational environment. InfiniBand remains a powerhouse for deterministic, low-latency performance in isolated silos. However, WhiteFiber’s Scheduled Fabric Ethernet is designed for teams that require multi-tenancy flexibility, broad vendor compatibility, and standard Ethernet interoperability without sacrificing the deterministic performance of InfiniBand. It provides a way to scale to thousands of GPUs while maintaining the management simplicity of an Ethernet-based data center.

Last updated:

May 2026

Why WhiteFiber Re-Engineered the Network for AI Clusters

Tom Sanfilippo

CTO

, WhiteFiber

How WhiteFiber re-engineered Ethernet for AI clusters to reach 97.5% bandwidth and higher GPU utilization with deterministic networking.

Lorem ipsum dolor sit 1

When we design large AI clusters at WhiteFiber Cloud, we start with a simple question:

‍

Why are so many GPUs idle?

‍

Over the past year, we’ve worked with teams running state-of-the-art NVIDIA H200 clusters that were delivering 40–50% utilization. The hardware was not the problem. The models were not the problem. The network was.

‍

As models grow and parallelism scales, the network becomes the system. All-reduce, all-gather, checkpointing, storage reads, telemetry—every phase of training stresses east-west traffic. When congestion appears inside the fabric, GPUs stall. Tokens per second drop. Training time expands. Cost per model increases.

‍

We wanted to understand how close we could get to theoretical bandwidth while maintaining Ethernet’s flexibility and scale.

‍

The Technical Challenge

‍

AI clusters are different from traditional data center workloads.

‍

They generate:

‍

Large volumes of small packets

Bursty, synchronized traffic patterns

Sustained all-to-all communication

‍

In standard Ethernet fabrics, even well-tuned ones, congestion control mechanisms are reactive. Techniques like PFC (Priority Flow Control) and ECN (Explicit Congestion Notification) help, but they introduce operational complexity and can struggle under synchronized load. Head-of-line blocking becomes visible at scale. Performance becomes variable.

‍

InfiniBand solves many of these issues through deterministic behavior and tight control of traffic flow. We deploy InfiniBand in environments where it is the right answer. But some customers with more diverse and complex environments require Ethernet interoperability, multi-tenancy flexibility, or broader vendor compatibility.

‍

We wanted an Ethernet architecture that behaved deterministically under AI load.

‍

Why Scheduled Fabric

‍

We partnered with DriveNets to implement a scheduled fabric Ethernet design. The core idea is simple but powerful:

‍

Instead of reacting to congestion, schedule traffic to avoid it.

‍

Packets are broken into fixed-size cells. These cells are scheduled across the fabric in a connection-oriented manner. The result is a lossless, deterministic system where the fabric behaves as a coordinated whole rather than as independent switches.

‍

From an architectural perspective, this shifts the network from probabilistic fairness to controlled allocation.

‍

The benefits are measurable:

‍

Congestion is controlled inside the fabric
Links operate near peak efficiency
No specialized NICs are required
Standard Ethernet overlays remain supported

‍

For large GPU clusters, this matters.

‍

What We Built

‍

On top of this architecture, we deployed a 512-GPU NVIDIA H200 cluster with HPE, backed by a high-performance storage tier and a scheduled fabric Ethernet core capable of 400 GB/s theoretical bandwidth between buses.

‍

We intentionally kept tuning minimal. We wanted to measure real-world performance, not lab-only conditions.

‍

Using NCCL benchmarks, we measured an average of 390 GB/s—97.5% of peak theoretical bandwidth.

‍

We then ran end-to-end TorchTitan benchmarks using a Llama 3.1 8B model across 32 and 64 GPU configurations.

‍

Two metrics mattered:

‍

Tokens per second (TPS)

Model FLOPS Utilization (MFU)

‍

TPS remained stable as we scaled. MFU reached 63%, exceeding prior published Hopper-class results.

‍

The result was not just high bandwidth. It was sustained, predictable performance under real training workloads.

‍

What We Learned

‍

Three lessons stand out:

‍

Network efficiency matters.

When performance plateaus or diminishes, many teams scale compute. In many cases, the more effective lever is network determinism.

Reactive congestion control has limits at AI scale.

As synchronization increases, predictable scheduling outperforms probabilistic fairness.

Architecture matters more than components.

GPUs, storage, topology, orchestration, and networking must be designed together. Optimizing one layer in isolation yields diminishing returns.

‍

We do not view this as Ethernet versus InfiniBand. Both have strengths. Our responsibility is to understand the tradeoffs and design for the workload.

‍

Why This Matters

‍

AI infrastructure investment is accelerating. As clusters scale from tens to hundreds to thousands of GPUs, inefficiencies compound quickly. A 10% drop in utilization at 512 GPUs is manageable. At 10,000 GPUs, it becomes material.

‍

Deterministic networking reduces variance. Reduced variance improves planning. Improved planning reduces cost.

‍

For enterprises building long-lived AI infrastructure, predictability is not optional.

‍

Where We Go Next

‍

We will continue benchmarking across both scheduled fabric Ethernet and InfiniBand deployments. We will continue publishing data. And we will continue designing heterogeneous environments where customers can select the right interconnect for the job.

‍

Our goal is simple:

‍

Make GPUs do work.

‍

When the network stops being the bottleneck, clusters behave the way they were meant to.

‍

That is the work.

News

WhiteFiber announces cross-data-center networking solution that will transform the world of AI compute

Regulated AI

Let Cooler Heads Prevail

Biotech