Are You Just Moving a Bottleneck in Your VSAN Solution?

Summary

The original scope of the POC was to test performance of an all NVMe vSAN. Sometimes, the original scope grows and we try to accommodate our customer if the time and resources allow. We actually tested three different vSAN solutions that our customer was interested in. The first vSAN solution was around VxRail (Really, VxRail is just vSAN with automation on top). Once the VxRail performance testing was completed, we moved back to our original all NVMe vSAN cluster performance testing using HPE and Dell platforms.

The performance testing that we performed was VERY insightful in the fact that it revealed something about storage solutions and specifically how a customer should pay close attention to the FULL solution. You may think you are designing and architecting a better performing solution with the promise of newer technologies and protocols like NVMe, but are you just moving the bottleneck?

ATC Insight

1st solution tested (VxRail)

VxRail Hardware Specifics

8 x VxRail P570F
384 GB of RAM Per Server
Intel(R) Xeon(R) Platinum 8160M CPU @ 2.10GHz x 2
2 x NVMe 745 GB drives for Cache - 6 x 1.75 TB SAS SSD's for Capacity
4 Ports of 25 gig (2 dedicated for vSAN only - 2 dedicated for vMotion, VM, and ESXi Mgmt traffic)

2nd solution tested (Dell vSAN)

Dell Hardware

8 X 740 XD 14th Gen
1.5 TB of RAM Per Server
Intel(R) Xeon(R) Gold 6226 CPU x 2 per server
16 x 3.5 TB NVMe Drives for capacity per server (14) and cache (2)
4 ports of 25 gig (2 dedicated for vSAN only - 2 dedicated for vMotion, VM, and ESXi Mgmt traffic)

3rd solution tested (HPE vSAN)

HPE Hardware

8 X DL380 Gen 10's
1.5 TB of RAM Per Server
Intel(R) Xeon(R) Gold 6226 CPU x 2 per server
2 x 1.46 TB NVMe Drives for cache per server
14 x 3.5 TB NVMe Drives for capacity per server
4 ports of 25 gig (2 dedicated for vSAN only - 2 dedicated for VMotion, VM, and ESXi Mgmt traffic)

Supporting TOR Switch Hardware

HPE hardware was tested using Cisco 93180 switch
Dell hardware was tested using Arista DCS 7280 switch
VxRail hardware was tested using Arista DC 7280 switch
The ToR switches were using 25 GB Twin AX for connectivity

Similarities Between Solutions To Call Out

All three solutions were tested with an 8 node setup.
All three solutions were running VMware ESXi, 6.7.0, 14320388
All three solutions were tested with 25gb switching and the same 25gb twinax cables

Differences Between Solutions To Call Out

HPE was tested on Cisco switch vs Dell and VxRail on a Arista
The default vSAN Default Storage policy was set to (Raid 6 Erasure Coding)
All three solutions had 2 disk groups (VxRail only had 8 disks 1 NVMe cache and 3 SSD's while both the HPE and Dell vSAN ready nodes have 16 disks 1 NVMe cache and 7 capacity NVMe drives per disk group)
VxRail had 2 x Intel(R) Xeon(R) Platinum 8160M CPU @ 2.10GHz while the Dell and HPE vSAN ready nodes had Intel(R) Xeon(R) Gold 6226 CPU x 2 per server

Performance Testing (via Vdbench Tool)

Vdbench was deployed the same across the VxRail, HPE, and Dell solutions from the fill, to the age, to the actual performance test. Thirty two workers were deployed for a total of 4 workers per node. After the workers were deployed, we ran a Vdbench work load to completely fill the 4 vmdk test drives per VM. This resulted in a 40% fill of the total vSAN datastore capacity.

Now if you are ok with math, and you read all the hardware specifications I laid out in the solutions above, you may be asking yourself, "Is this truly a fair test for all solutions?" To make it fair, we used a disk range option in the Vdbench configuration file to make sure the test was "like for like" on all three solutions.

Below is an exert from the fill configuration that was run on all three solutions.

The three tests that were used for testing some would say are "HERO" numbers. "HERO" numbers are merely the baseline benchmark tests that were used for all three solutions.

There "HERO" numbers used were as follows:

100% 4K reads, 100% 128K reads, and 70% 4K reads and 30% 4K writes

Below is an exert from the workload definition file. The range argument that you see below is for the first VxRail test where the total capacity was much less than the all NVME vSAN configuration to keep it consistent for disk size. That VXRail range argument was reduced to 0,5.

Test Results

Here are the results of running the Benchmark tests for all three solutions. The blue columns are snapshots of time in 30 minute intervals. They represent the number of IOps in 4K and 128K blocks. The orange line depicts avg. read/write latency in milliseconds. The tests below ran for 5 hours each.

Side by Side "Hero 4K 100% Read" Dell, HPE, VxRail Comparison

Side by Side "Hero 128K 100% Read" view of Dell, HPE, VxRail Benchmarking

Side by Side "Hero 4K 70% Read 30% Write" view of Dell, HPE, VxRail Benchmarking

You may be asking yourself after reviewing the results...Why NVMe? I thought I was supposed to get a crazy number of IOps (Input/Output Operations Per Second). Well that was exactly what I was thinking too! As it turns out CPU matters! During not only the steady state tests, but in the benchmark tests above we saw much higher CPU usage than we did on the VxRail's

Some of this can be credited to the difference in the Platinum CPUs used in the VxRail to the Gold CPUs used in the vSAN solutions. Even though the VxRail had older Skylake CPUs it had double the cores and threads than the Cascade Lake CPUs. Which shifted the bottle neck from the disks to the CPU and NVMe lanes.

As far as CPU was concerned in the VxRail we saw CPU utilization at 15-20% compared with the other two all NVMe vSAN solutions (Dell and HPE) with CPU utilization closer to 50-60% at steady state. To further add on here, in the benchmark testing VXRail was around 30-40% in CPU utilization compared to 70-80% CPU utilization on the all NVMe vSAN solutions.

Here are some quick specs I pulled from Intel's website to compare

So what is the moral of the story of this ATC Insight after testing?

In the infrastructure world, this equates to "Don't design your solution for only storage if your goal is to get the most amount of IOps and performance driven out of the solution". You must consider the other places that can effect the performance of the solution (Like the CPU's chosen in the solutions in our testing). Otherwise, you may end up just moving the bottleneck.

Test Plan/Test Case

Procedure at a High Level

vSAN Setup

Creation of vSAN Cluster
Creation of vmkernel ports with vSAN enabled
Creation of Disk Groups to be used for vSAN

Vdbench Setup

Creation of Vdbench workers on vSAN cluster
Creation of Fill, Age, and Workload definitions to be used with Vdbench workers

vSAN Performance Testing

Use Vdbench workers to drive curve tests for vSAN with dedupe and compression turned on
Use Vdbench workers to drive curve tests for vSAN with dedupe and compression disabled

Solutions at High Level Tested for Customer

VxRail Solution Testing
Dell vSAN Solution Testing
HPE vSAN Solution Testing

Test Tools

Vdbench

Vdbench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network connected storage. The software is known to run on several operating platforms. It is an open-source tool from Oracle.

To learn more about Vdbench, visit the wiki. Wiki

Graphite/Grafana

A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts.

Learn more about this product. Website