Enterprise storage teams and the DBAs usually have each other on speed dial.  DBAs are frequently calling to ask for more capacity and more performance. Storage engineers call DBAs to ask where their capacity went and why the databases are consuming so much performance. Anything to make this symbiotic relationship better is a good thing. For IP-based storage networks, a new protocol may help as its support from host operating systems and storage vendors improves.  Read on as we compare NVMe over TCP's Oracle RAC performance against traditional SCSI-based protocols, Fibre Channel, and iSCSI.

The SCSI command set has been around since the 1980s. As discussed here, SCSI is very flexible and can talk to tape, CDROM, scanners, and of course, disk drives. It is used as the storage protocol within SAS, iSCSI, and typically, Fibre Channel, each taking advantage of physical layer speed increases as time has gone on. In the never-ending game of 'move the bottleneck,' the shift to all-flash arrays highlighted the need for a lighter-weight storage protocol to allow for the continued increase in storage performance. This is where NVMe comes in. With a streamlined command set, a very wide and deep queueing structure (up to 64,000 queues, each 64,000 deep), and 66% fewer CPU cycles needed to perform an IO, it was expressly designed to use solid-state storage to its full potential. We've seen the storage OEMs roll out backend NVMe and, more recently, frontend NVMe in a few flavors: NVMe over {Fibre Channel, TCP, iWARP, or RoCE}. Today, we'll focus on NVMe over TCP.

As noted in the article linked above, NVMe over TCP is very simple to deploy. We believe it will be, more or less, a drop-in replacement for iSCSI. Its most significant downside currently is a lack of support across all operating systems. Dell's support matrix for NVMe over TCP operating systems is ESXi 7.0u3 and SLES 15 SP4. Knowing one of the other major enterprise Linux distributions is RHEL, we would expect this to arrive on the support matrix in the near term. 

Test Setup

To take another step forward from the previous article, we set about putting real-world application numbers on the same NVMe over TCP setup.

  • Two Cisco UCS B200-M5 blades
  • Dual 10Gb/s ports per blade
  • Dual 16Gb/s FC ports per blade
  • ESXi 7.0u3
  • PowerStore 9000T
    • PowerStore OS 2.1.0.1
    • 10Gb/s Ethernet ports
    • 32Gb/s Fibre Channel ports
  • RHEL 8.5

This time, instead of the synthetic vdbench workload generator, we used four Oracle RAC 21.6 VMs and Kevin Closson's Silly Little Oracle Benchmark (SLOB). For those not familiar, SLOB is an Oracle IO workload generation tool. It runs against a database installation to exercise the entire stack and generate storage traffic in the process. Databases are very common workloads in our customers' environments and tend to be large consumers of storage resources, both performance and capacity, so we wanted to put more realistic numbers behind the NVMe over TCP and compare it with longstanding common storage protocols.

We tested the following storage protocol configurations:

  • 16Gb/s Fibre Channel
  • iSCSI
    • VMware software initiator
  • NVMe over TCP
    • VMware software initiator
    • In-guest software initiator

In the case of both NVMe over TCP initiators today, they're driven by the hosts' CPU. In one case, they're run by the hypervisor; in the other, the guest CPU runs the protocol. Given NVMe's simplified structure and CPU requirements, we expect it to perform better than iSCSi. FC is offloaded to hardware cards (HBAs), so its CPU utilization will be minimal-to-non-existent and perform better than the CPU-driven protocols.

Fibre Channel's speeds grow in powers of two (2Gb/s, 4Gb/s, 8Gb/s, 16Gb/s, and so on); Ethernet sometimes grows in powers of ten (10Mb/s, 100Mb/s, 1000Mb/s, 10000Mb/s). Unless we take FC back to 1998 when it was operating at 1Gb/s, we don't have a good way to test with like-for-like speeds. Yes, 10Gb FC was a thing, but it never achieved critical mass. Here's the picture of the aggregate per-protocol link capacity per host. Without spoiling it, you'll see something surprising below for the underdog in the race. Please keep in mind that FC has an advantage as you read on.

The Test

Firing up SLOB, we get the following results. Each protocol was run five times, and the results were averaged to produce the charts below.  

 

First, let's look at the in-guest initiator versus Fibre Channel. The in-guest initiator turned in 27% less performance despite having 37.5% less available bandwidth!  

Unexpectedly, VMware's NVMe over TCP software initiator underperformed the iSCSI adaptor. While its read performance was about the same, it turned in half the write performance. This could be due to the first-generation release of this software adaptor or because ESX was translating guest SCSI from the VMware paravirtual adaptor into NVMe for the front end. In any case, let's see if we can fix this.

The previous configuration had all NVMe traffic on a single VLAN only. We reconfigured ESX port groups and VLAN affinity to send the A-side VLAN down one of the physical NICs and the B-side down the other. The tests of the VMware initiator were re-run, and the results are below. In addition to 39% less read latency, the split traffic configuration got 42K IOPS more than the first run.  

 

 

 

Conclusion

The NVMe over TCP configurations we've reviewed need to be observed as separate instances, each in the context of what we're solving. By that, we mean using VMware's software initiator versus having your guest control its storage. For this, we will unabashedly sit on the fence about which approach is better. You know your environment's manageability and performance requirements, and as with most things in IT, the correct answer is probably a hybrid of both.

For overall management simplicity, VMware's software NVMe over TCP initiator is the way to go; the central management points are the array(s) and the ESXi servers. If doing so, the advice, as shown above, is to ensure you have an A/B multipathing setup on your ESXi host(s). In our experience, this setup will perform better than VMware's software iSCSI initiator. Presently, vmdks are the only supported NVMe over TCP storage in this configuration; RDMs of NVMe over TCP volumes are not possible as of the time of this writing.

If performance is a critical concern, it seems logical to put the guests in charge of their storage. The trade-off is manageability; the in-guest method is decentralized, and all troubleshooting happens at a guest level. Additionally, you now have many more touchpoints to update when changing the upstream configuration. However, for the near-FC performance gained by the applications that need it, this is very much a worthwhile upgrade.

The other thing to consider is whether your required operating system is on the support matrix. As noted in the previous article, there may be significant performance differences between versions. People in charge of business-critical data, both storage folks and DBAs, tend to prefer safety and consistency above all else. Staying within the guardrails of your OEM's support matrix is always recommended; if you need to stray, the OEMs all have qualification processes for one-off configurations. We would venture a guess that RHEL and OEL are common Linux distributions for Oracle databases, and given that storage arrays tend to sit behind databases, Dell says their addition to the support matrix is in-process. As a new protocol, NVMe over TCP has a lot of promise for what it delivers, so it should only get better from here.

The sheer amount of performance available in NVMe-based arrays, the amount of capacity packed into a rack unit, and new protocols to streamline the transactions make this an exciting time in the storage world. NVMe over TCP is another piece of the modern data center puzzle. As discussed above, it, like all other technologies, doesn't necessarily wholesale replace older technologies. The arrival of the tank didn't outmode the infantry. NVMe over TCP's maturity and support ecosystems aren't near what FC and iSCSI are, but for the workloads that can use it, NVMe over TCP works well. It doesn't completely replace everything FC offers in terms of a purpose-built storage network, but for bulk consumption use cases, it's a winner.  

To learn more about our testing, our experience with data center storage fabrics, or for help choosing what fabric to use, connect with us or any of our other experts.

Technologies