Hands on With PowerStore 2.1, NVMe Over TCP and SmartFabric Storage Software
In this blog
As the leading Dell primary storage channel partner, we engage in beta programs that allow us early access to hardware and software. We can test new features in our environment, provide feedback on usability and functionality before these products hit the streets, and help shape the final product to better suit our customers. In addition to pre-launch PowerStore 1.0, we were able to test PowerStore 2.1 in 2021. This article will discuss the various technologies in play and hands-on usage.
Dell Technologies released PowerStore in May 2020. Taking a crawl-walk-run approach to its development and release, it launched to the market with a feature set that was not fully fleshed out yet. Now in 2022, we see a terrific follow-up to last year's 2.0 launch.
Taking a step back, PowerStore is Dell EMC's latest entry into the midrange storage space. With Dell and EMC coming together in 2017, the midrange storage portfolio consisted of:
- XtremIO (this is arguably a tier 1 or tier 1.5 array, but since it's sometimes included in midrange price bands, I have included it here)
The arrays had things they did well, but the number of platforms to choose from was confusing, like trying to choose a peanut butter at the grocery store. Aiming to simplify the portfolio, they took features from each of the above systems, planned for future technologies, and combined them to create PowerStore.
The first major release after launch was PowerStoreOS 2.0 in May 2021. It brought with it:
- A new lower-cost model, PowerStore 500T, supports the same functionality as its larger siblings, including NVMe over FC, NAS, and inline data reduction.
- Performance improvements include up to 65% in write performance and 25% more IOPS across the board.
- NVMe over Fibre Channel support, providing end-to-end NVMe capabilities with no translation layer from FC/SCSI to the drives' NVMe connectivity.
Fast forward to January 2022, and we have the latest iteration of the software powering the system, PowerStoreOS 2.1. Several of its new features are:
- NVMe over TCP, providing the lightweight NVMe storage protocol over standard TCP/IP
- SmartFabric Storage Software (SFSS)
- Thin Upgrades
- NEBS-certified PowerStore 500T DC-powered model
The bulk of the new additions are covered here. The two features that will have the most impact on customers are NVMe over TCP and SFSS.
SCSI is a very flexible protocol and can communicate with various devices like hard drives, ZIP Drives, CD-ROMs, scanners, and magnetic tape. This adaptability comes with inefficiency that became apparent as storage drives got faster. In 2007, a consortium of companies began work on the NVMe protocol to interface with up-and-coming flash drives in a manner designed to take advantage of flash's unique nature. NVMe drops all of SCSI's baggage, moving from over 144 protocol commands to just 13, and focuses on SSDs. Taking 66% fewer clock cycles to accomplish an IO and providing improved, inherently parallel queueing mechanisms, NVMe better utilizes the performance capabilities of solid-state media.
Initially designed for direct-attached storage (AKA a drive in your laptop), some work was needed to make the NVMe protocol run over network fabrics like iSCSI runs over ethernet or SCSI runs over Fibre Channel. Enter NVMe over fabrics (NVMeoF). The first major NVMeoF to hit the market with any level of adoption is NVMe over Fibre Channel, using Fibre Channel as the transport medium with NVMe data packets inside.
There are a few contenders in the NVMe ethernet transport space, RDMA over Converged Ethernet (RoCE), iWARP, and NVMe over TCP. While the RoCE and iWARP generally offer higher performance because of RDMA support, they may require particular network or hardware configurations. Dell has chosen NVMe over TCP for frontend host connectivity. NVMe over TCP allows for standard data center networking configurations to be utilized, thus simplifying NVMe/TCP adoption. For the RoCE fans out there, RoCE v2 addresses the significant issues with RoCE v1 (v2 is routable), but industry analysts still predict NVMe over TCP to be the predominant protocol.
As noted above, PowerStoreOS 2.1 is required for running NVMe over TCP on the array. After upgrading the PowerStore array or cluster to 2.1, enabling NVMe over TCP is simple. Within PowerStore Manager, navigate to the network IPs pane, select the storage network sub-pane, choose your desired storage network, and choose "reconfigure." In that dialog, enabling the following checkbox is all that is required.
Confirm successful activation of NVMe over TCP by looking at the purposes column within the storage network IPs pane.
The network fabric purpose-built for storage, Fibre Channel, utilizes the concept of zoning to define which ports are allowed to chat. Typically this is accomplished with a single initiator (host) and a single target (array port) in a zone. Fibre Channel networks also include services for discovery, registration, query, and notification services. This functionality does not exist in ethernet networks per se, but as more ethernet fabrics carry storage traffic, a simplified way of connecting hosts to arrays is needed. To solve this, two NVMe/TCP standards were created: centralized discovery services (TP-8010) and automated discovery of NVMe controllers (TP-8009). Together, these two standards provide fabric services similar to Fibre Channel. To implement these standards, Dell has released a centralized discovery controller called SmartFabric Storage Software. It consists of a containerized application package that runs as a VM on ESX. As a connectivity director, SFSS does not sit within the data path; it directs hosts to their storage and gets out of the way. SFSS can also work with non-Dell servers, storage and networking as long as they conform to the appropriate standards.
For those familiar with Fibre Channel networking, SFSS's functionality will feel very similar. At a high level, there is a zone group that contains zones. Zones contain endpoints that will be allowed to communicate with each other. The endpoints within the zone are a host node-qualified name (NQN) and a subsystem (array) NQN. These zones are added into a zone group for the centralized discovery controller (CDC) instance and activated.
SFSS is not a requirement for NVMe over TCP storage. You can individually configure the hosts to connect to the storage arrays, like iSCSI. While this works, it can become cumbersome at scale. Additionally, like the rest of Dell's primary storage portfolio, SFSS supports automation through Ansible and a REST API.
Now that we've laid down the technologies in play, let's put them into action. The initial test setup consisted of:
- 2x Cisco M5 blades running ESXi 7.0u3
- 2x Ubuntu 20.04 VMs with 32GB of memory each, one on each UCS blade
- PowerStore 5000T running PowerStoreOS 2.1
To each VM, ten 250GB volumes were allocated. One VM got its volumes using the Linux software iSCSI initiator and the other using the Linux NVMe over TCP initiator. Each initiator saw the same four frontend paths, which all utilized the same frontend ethernet ports on the array. So the hosts, VMs, and array are the same; the only difference is the storage protocol. I'll focus my screenshots on the setup of NVMe because it's new and likely to be unfamiliar.
First, we'll walk through zoning with SFSS. Here is my zone containing the host and subsystem NQNs.
Before zone group activation, a host discovery of the SFSS CDC IP will show only SFSS. I'm utilizing nvme-cli in my Ubuntu 20.04 VM to test connectivity. After ensuring the nvme-tcp kernel module is loaded, discovery is accomplished with
nvme discover -t tcp -a <CDC _address>.
After zone group activation, host discovery will be able to find the array's frontend ports. In this screenshot from the same host, you can see four FE ports from a PowerStore array.
You can have the host login to the array with
nvme connect-all to login to the array and see all active connections to the subsystem with
From here, you can perform typical host and storage provisioning within the array, creating volumes, host groups, and mapping the two together. The only difference, in this case, is you will choose the NVMe protocol instead of iSCSI or Fibre Channel. The output of
nvme list displays all of the provisioned namespaces (volumes) my host has available.
Now, vdbench is started on both VMs to gauge performance differences between the two protocols. The parameters used for the workload definition are here:
We're running a 50/50 read/write split from the workload definition string above, and 50% of the I/O is 64KB in size. Here are the results.
NVMe over TCP performance:
The workloads were allowed to run for several hours, and there was no performance change from the above screenshots. While this test is not scientific, this provides a directionally correct view of performance differences between the two protocols.
|Avg RT (ms)||0.96||0.81||16%|
|Avg write RT (ms)||1.15||0.80||30%|
I don't think anybody will complain about a ~20% performance bump, but the biggest winners here are the 44% reduction in CPU utilization and 30% faster average write performance. A host can be busy doing other things like database transactions or running other VMs, rather than waiting for IO.
Sharp-eyed viewers will notice in the above vdbench screenshots that, while overall response times were good with the NVMe over TCP hosts, its max column is all over the place. Not shown in the screenshot is the NVMe max response time jumps to 500 milliseconds every twenty seconds. I started exploring this to see if it was noise in the lab environment (seems odd, but possible; this network isn't optimized for storage traffic just yet) or configuration-related. Knowing SLES and RHEL are the two enterprise Linux distros with good storage OEM NVMe support, I created two RHEL 8.5 VMs. I then ran the same two VM iSCSI vs NVMe/TCP comparisons, with the same workload, this time against our PowerStore 9000; I've included deeper results this time.
|Stat (Avg)||iSCSI||NVMe||NVMe/TCP Advantage|
|Read RT (ms)||1.03||0.72||143%|
|Write RT (ms)||1.02||0.72||142%|
|Max Read RT (ms)||38.90||25.91||150%|
|Max Write RT (ms)||17.99||21.69||83%|
|Read RT Std_Dev||0.77||0.56||138%|
|Write RT Std_Dev||0.62||0.48||128%|
|% CPU Util||9.86||12.04||82%|
So this is interesting. For the most part, NVMe over TCP has a ~40% advantage over iSCSI. Where it didn't do as well is max write response time. However, the average write response time is better so I am calling this a win for NVMe over TCP. The other one that's different this time around is the CPU utilization. Clearly, NVMe over TCP is using more overall CPU, but in this case ~20% more CPU for ~40% better IO performance is still a win for NVMe/TCP. Additionally, the standard deviations are lower, meaning more consistent response times overall. The RHEL NVMe initiator is producing better overall results.
All of this seems terrific, so what's the catch? The biggest issue is platform support. NVMe over TCP is well supported in Linux, and there's more to come in the nvme-stas package, which adds support for centralized discovery controllers and auto-discovery. As shown above, not all Linux distros are created equal. Ubuntu shows performance challenges, specifically hitching every twenty seconds while RHEL works great. Dell's initial supported Linux distros this year will be RHEL and SLES. To use NVMe over TCP with ESXi, you have to be running 7.0u3 as that's the first version that includes the ability to add a software initiator. Windows Server operating systems also do not have a native initiator yet and I'm hearing maybe 2023 for support. The other side of the coin is array support. In Dell's portfolio, PowerStore is the only array capable of NVMe over TCP today; they have stated they are "all-on NVMe over TCP," so I expect additional platforms to support the protocol as well. In fact, Scott Delandy says PowerFlex and PowerMax are due out in 2022.
Dell did an excellent job making this simple to set up and easy to administer. The centralized discovery controller is a breeze to install and configure. PowerStore's additional functionality is just a code update away, which brings other improvements. As mentioned, the lower per-IO CPU utilization afforded by NVMe over TCP is significant for workloads that do a lot of IO. That's a big help, especially when those clock cycles have a cost associated with them like they do in the public cloud. Additionally, when using a public cloud-agnostic storage service like Faction, any improvement in response time is a win.