Enterprise data centers face mounting pressure from workloads that demand both massive throughput and microsecond-level latency. AI training, real-time analytics, and transaction processing systems generate I/O patterns that traditional storage protocols simply can't accommodate. Legacy protocols designed for spinning disks create artificial bottlenecks that prevent modern flash storage from delivering its full potential.
This is where NVMe over Fabrics (NVMe-oF) comes in. NVMe-oF extends the high-performance NVMe protocol beyond individual server chassis, enabling networked storage that performs like direct-attached drives. The protocol delivers dramatically lower latency, higher throughput, and more efficient CPU utilization than iSCSI, SAS, or traditional Fibre Channel implementations.
NVMe over Fabrics is a network protocol extension that enables NVMe commands to transfer data between hosts and solid-state storage across Ethernet, Fibre Channel, or InfiniBand networks. By maintaining the massive parallelism and efficiency of local NVMe while adding fabric connectivity, NVMe-oF eliminates the performance penalties that have historically separated networked storage from direct-attached flash.
For organisations running AI workloads, latency-sensitive databases, or high-performance computing applications, NVMe-oF represents a fundamental shift. This article examines how the protocol works, compares three transport options, and provides guidance for selecting the right implementation for enterprise environments.
When NVM Express, Inc. released the original NVMe specification in March 2011, it focused exclusively on local PCIe-attached storage. The protocol's efficiency—supporting 64,000 command queues with up to 64,000 commands per queue—made flash storage dramatically faster than SATA or SAS interfaces could support.
But this performance remained trapped inside individual servers.
The NVMe-oF 1.0 specification, released in June 2016, extended NVMe's efficiency to networked storage with RDMA transport support. In 2019, version 1.1 added TCP transport, enabling deployment over standard Ethernet infrastructure without specialized hardware. NVMe 2.0 in 2021 introduced Zoned Namespaces and enhanced management features, positioning NVMe-oF as a complete enterprise fabric solution.
Today, the protocol has moved from emerging technology to production standard, particularly in environments where storage performance directly impacts business outcomes.
Before NVMe-oF, enterprise storage networks relied on three primary connection types, each designed for an earlier generation of storage media.
iSCSI encapsulates SCSI commands within TCP/IP packets, allowing storage to flow over standard Ethernet networks. While iSCSI democratized networked storage by eliminating the need for specialized Fibre Channel infrastructure, it carries significant overhead. The protocol stack involves multiple layers of processing—SCSI translation, TCP segmentation, IP routing—that introduce latency and consume CPU cycles.
SAS uses point-to-point serial connections to transfer SCSI commands over dedicated cables, with speeds up to 22.5Gbps. But even at peak speeds, it can't fully utilize the random I/O capabilities of modern NVMe SSDs. The protocol's command queuing depth—far shallower than NVMe's 64,000 queues—creates artificial limits on parallel operations.
Fibre Channel transports SCSI commands over fiber optic or copper cables, operating at speeds from 8Gbps to 64Gbps in modern implementations. The protocol has dominated enterprise storage area networks for decades due to its reliability and performance.
But FCP still carries SCSI overhead. Each I/O operation requires translating NVMe commands to SCSI, transmitting them over FC, then translating back to NVMe at the storage array. This translation layer adds latency and complexity that newer protocols eliminate.
NVMe-oF maintains strong protocol consistency with local NVMe while adding the network extensions necessary for fabric operation.
Unlike local NVMe's memory-mapped model, NVMe-oF uses a message-based approach for remote communication. Commands and responses are packaged into capsules—self-contained units that combine submission queue entries or completion queue entries with data, metadata, or scatter-gather lists. This structure improves efficiency by allowing multiple small messages to be transmitted as a single unit.
NVMe-oF preserves the massive parallelism that makes local NVMe efficient.
The protocol supports up to 65,535 I/O queue pairs per controller, with each queue capable of holding up to 65,535 commands. This depth allows thousands of I/O operations to proceed simultaneously without blocking.
For comparison, SCSI-based protocols typically support 256 or fewer outstanding commands per device. When an application issues 10,000 parallel read requests, SCSI creates a serialization bottleneck, while NVMe-oF processes them concurrently.
NVMe-oF uses NVMe Qualified Names (NQNs) to identify storage targets, with standardized discovery mechanisms that allow hosts to locate available storage subsystems without manual configuration. The protocol also supports multipath I/O, enabling multiple simultaneous paths between hosts and storage for both performance aggregation and high availability.
NVMe-oF operates over three primary transport types, each with distinct characteristics that suit different deployment scenarios.
NVMe over Fibre Channel maps NVMe commands directly onto Fibre Channel frames, eliminating the SCSI translation layer that FCP requires. Organisations with existing FC infrastructure can adopt NVMe/FC through firmware upgrades to Gen 5 or later switches and host bus adapters—no physical infrastructure changes required.
NVMe/FC operates at standard FC speeds: 16Gbps, 32Gbps, and 64Gbps with Gen 7 hardware. Because FC switches are protocol-agnostic, they can simultaneously support legacy FCP and modern NVMe/FC traffic, making migration straightforward.
Fibre Channel transport suits enterprise environments with established FC SANs and workloads requiring the highest reliability. Databases, ERP systems, and virtualised infrastructure particularly benefit from FC's proven stability and low latency with minimal CPU overhead.
NVMe over TCP encapsulates NVMe commands within TCP/IP packets, enabling deployment over standard Ethernet infrastructure. The protocol became part of the NVMe-oF specification in 2019 and has rapidly gained adoption due to its deployment simplicity and broad hardware compatibility.
NVMe/TCP works with any modern network interface card (NIC) capable of 10Gbps or higher throughput—no specialized hardware required. The tradeoff comes in CPU utilization, as software initiators handle command processing. However, modern multi-core processors often have sufficient capacity to absorb this overhead without impacting application performance.
Cloud environments, where Ethernet dominates, particularly benefit from TCP transport. Development and test environments frequently adopt NVMe/TCP for its simplicity—teams can quickly provision NVMe-oF storage using existing network infrastructure.
NVMe over RDMA uses Remote Direct Memory Access (RDMA) technology to transfer data directly between host and storage memory without involving the operating system or CPU. RDMA over Converged Ethernet (RoCE) has emerged as the dominant RDMA protocol for storage networks, requiring RDMA-capable NICs that handle data transfer in hardware.
NVMe/RDMA delivers the lowest latency of any NVMe-oF transport option—often matching the performance of direct-attached NVMe drives. Latency typically measures in single-digit microseconds, making the protocol suitable for the most latency-sensitive applications.
High-performance computing, AI model training, and trading applications, where microseconds can affect outcomes, typically choose NVMe/RDMA. Organisations should evaluate whether their applications can actually benefit from single-digit microsecond latency—many enterprise workloads perform identically on NVMe/TCP or NVMe/FC, making RDMA's additional cost and complexity unnecessary.
Note: Performance figures represent typical deployments under optimal conditions. Actual results vary based on workload characteristics and network configuration.
Choose NVMe/FC when existing Fibre Channel infrastructure is in place, applications require proven enterprise reliability, and consistent sub-50 microsecond latency is required.
Choose NVMe/TCP when Ethernet is the primary network infrastructure, deployment simplicity and cost control are priorities, and cloud or hybrid cloud architecture is planned.
Choose NVMe/RDMA when applications genuinely benefit from single-digit microsecond latency, budget supports RDMA-capable infrastructure, and workloads include HPC, AI training, or real-time analytics.
Applications that historically required direct-attached storage can operate on shared networked arrays with minimal or no significant meaningful performance compromise. This consolidation reduces hardware sprawl, simplifies management, and improves resource utilization. A single high-capacity NVMe-oF array can serve workloads from dozens or hundreds of servers, eliminating stranded capacity.
The protocol's efficiency reduces CPU overhead compared to legacy storage protocols, freeing processor resources for application workloads. In virtualised environments, this efficiency translates directly to higher VM density per physical host.
NVMe-oF enables organisations to unify their storage networks rather than maintaining separate infrastructure silos for different protocol types. A single fabric can serve both legacy SCSI workloads and native NVMe traffic, simplifying operations and reducing management overhead.
Training large language models (LLMs) generates massive I/O demands during data loading and checkpoint writing. NVMe-oF provides the sustained throughput needed to keep GPUs fed with training data without creating storage bottlenecks. Research organisations, for example, increasingly deploy NVMe/RDMA to minimize training time, where every hour saved can represent significant cost reduction.
Financial services firms and e-commerce platforms operate analytics pipelines that process streaming data with strict latency requirements. NVMe-oF allows these platforms to query massive data sets with consistent microsecond response times, enabling real-time decision-making that directly affects business outcomes.
Trading systems where milliseconds determine profitability require storage that introduces essentially no latency. NVMe/RDMA provides the lowest possible storage latency, allowing trading algorithms to access market data without delay.
VMware, KVM, and Hyper-V environments benefit from NVMe-oF's ability to deliver consistent storage performance to hundreds or thousands of virtual machines. The protocol's massive queue depth helps prevent noisy neighbor problems—a single VM's I/O burst doesn't degrade performance for others sharing the same storage.
Before deploying NVMe-oF, evaluate your current network architecture and identify required upgrades. For NVMe/FC, verify FC switches support firmware upgrades to enable FC-NVMe. For NVMe/TCP, ensure network switches and NICs support 10GbE or higher. For NVMe/RDMA, verify RDMA-capable NICs and switches that support RoCE v2 or InfiniBand.
Match transport choice to application requirements rather than choosing based on theoretical performance maximums. For instance, an application with 500-microsecond processing time is unlikely to see meaningful benefit from 10-microsecond storage latency alone, so the added cost and complexity of RDMA infrastructure may not deliver sufficient return.
Profile actual workload characteristics including IOPS patterns, queue depths, and latency sensitivity before specifying transport requirements.
Organisations migrating from iSCSI or FCP should pilot NVMe-oF with non-critical workloads first. For FC environments, the ability to run FCP and NVMe/FC simultaneously on the same fabric simplifies gradual migration. Storage arrays supporting multiple transports provide flexibility to serve different application tiers appropriately.
Profile current storage performance before selecting transport types. Many organisations might assume they need RDMA's single-digit microsecond latency when applications may actually perform identically with NVMe/TCP's sub-millisecond response times. Measure actual IOPS patterns, queue depths, and latency sensitivity under production load conditions.
Configure at least two independent paths between each host and storage target during initial deployment. This redundancy provides both performance aggregation and high availability. For mission-critical workloads, implement four paths across separate physical switches to eliminate single points of failure.
Establish baseline performance metrics immediately after deployment. Track latency percentiles (50th, 95th, 99th) rather than just averages—tail latency often reveals issues that average metrics mask. Monitor CPU utilization on application hosts, particularly for NVMe/TCP deployments where software initiators consume processor cycles.
The NVMe-oF ecosystem continues rapid development. NVMe 2.0 introduced frameworks for computational storage devices—drives that can execute processing tasks locally rather than transferring all data to host CPUs. As this capability matures, NVMe-oF will enable distributed computing architectures where processing occurs near data.
Zoned Namespaces (ZNS) provides applications direct control over data placement on flash media, improving write efficiency and endurance. As more vendors and applications adopt ZNS over NVMe-oF, organisations are likely to see reduced write amplification and improved SSD lifespan.
Industry adoption continues accelerating as organisations replace legacy SCSI-based systems with NVMe-oF infrastructure, driven by the protocol's performance advantages and the increasing availability of compatible hardware across major storage vendors.
NVMe over Fabrics eliminates the performance penalties that have historically separated networked storage from direct-attached drives. The protocol's three transport options—Fibre Channel, TCP, and RDMA—provide deployment flexibility while maintaining the massive parallelism and low latency that define NVMe's advantages. Organisations gain the ability to consolidate infrastructure, reduce CPU overhead, and simplify data centre operations without sacrificing the speed that modern applications require.
Transport selection should align with infrastructure realities, workload requirements, and cost considerations rather than pursuing theoretical maximum performance. NVMe/TCP's simplicity and broad compatibility make it ideal for cloud environments and organisations with established Ethernet infrastructure. NVMe/FC suits enterprises with FC investments and mission-critical workloads. NVMe/RDMA serves applications where microsecond-level latency directly impacts business outcomes.
For organisations evaluating storage modernisation, Everpure™ FlashArray™ delivers native NVMe performance across all transport options with consistent microsecond-level latency. DirectFlash® technology helps eliminate controller bottlenecks by enabling direct communication between flash media and the storage fabric, delivering latency as low as 150 microseconds and throughput up to 45GB/s with FlashArray//XL™ under optimal configurations. Evergreen® architecture enables non-disruptive upgrades without forklift replacements—organisations deploying NVMe/FC today can easily add NVMe/TCP or NVMe/RDMA transport options as requirements evolve. For hybrid cloud deployments, Purity CloudSnap™ provides data mobility between on-premises FlashArray and cloud object storage, combining local NVMe-oF performance with cost-efficient cloud-based data protection and backup capabilities.
Get ready for the most valuable event you’ll attend this year.
Access on-demand videos and demos to see what Everpure can do.
Charlie Giancarlo on why managing data—not storage—is the future. Discover how a unified approach transforms enterprise IT operations.
For nine consecutive years, Everpure has maintained a Net Promoter Score of over 80. Find out how we did it and what it means for our customers.