vSAN, Benchmarking, 10Gb, 25Gb, RoCEv2, and One More Thing…

"Just one more thing" was a phrase heard often in the old Columbo television crime drama. Just when the suspect thinks they have gotten away with their crime, Columbo would come back and ask them one more question that would usually land them in the slammer. While not a crime drama, we recently had a POC where we had a large financial customer that liked the results of the testing performed by the ATC so much that they kept coming back asking for "just one more thing."

The workloads

The Compute team was asked by this customer to run two separate workloads, one which targeted their general-purpose workload and another that they felt would mimic their SQL workload. We performed the workload generation testing against a rack of gear in the ATC that matched what the customer is currently deploying in their production data centers. While only two workloads, there were several subtests for each: the customer wanted to test the difference between deduplication and compression, ROCv2, as well as the difference between 10Gb and 25Gb network uplink speeds.

The technology of vSAN

vSAN is VMware's hyperconverged, software-defined storage platform built within the vSphere platform. vSAN takes disks from multiple servers and combines them together to form one large datastore.

Hardware

The rack consisted of 24 Dell Servers (Dell PowerEdge R640), two Arista (DCS-7280SR-48C6-R) switches, and one out of band management switch. The switches were cabled with both 10Gb and 25Gb connections and were peered together so we could use MLAG (Multi-Chassis LACP) for teaming the NICs. vCenter and the hosts were deployed with vSphere 7.0U2 and upgraded to 7.0U3 during testing.

24x Dell PowerEdge R640 14G
2x Intel Xeon Gold 6240 @ 2.6GHz
24x 64Gb@2933 MHz Dual Rank DDR-4
- Total 1536GB Per Host
2x 375GB NVME Cache Disks Per Host
8x 4TB NVME Capacity Disks per host.
2x Dual port Mellanox MT27710 connected at 2x10Gb and 2x25Gb

Testing tools

Testing was done with VMware's HCIBench fling with VDBench as the engine for the testing. The default testing engine for HCIBench is FIO, but to remain completely independent for testing we used VDBench. We immediately ran into issues because the customer wanted to test with LSI Logic Controllers, but HCIBench, by default, deploys VMs with the Paravirtual Controller, and this could affect the testing results. We worked with VMware to change the configuration of the HCIBench product to deploy VMs with the requested controller. For the SQL tests, we deployed a second HCIBench VM which deployed the Paravirtual driver by default.

Testing

The customer wanted to fill the vSAN cluster to 55-60%. After doing the math we deployed 96 VMs (four per host) and first did a fill with large 256k blocks to make sure that the first run would not be artificially faster than the rest. The first copy to flash will always be faster due to there being nothing to move/change. After filling with 256k blocks, we then did a WWT generic fill that is a blend of blocks that we felt mimicked a realistic environment.

Once the fill and age were complete, we tested the environment with the customer-provided General Purpose, 4k block, 32k block, 64k block, and a SQL workload. General Purpose, 4k, 32k, and 64k workloads were run with Raid6-FTT2, while SQL testing was done with Raid1-FTT2. Each test was run 2 times and then averaged to account for variations between runs. All testing started on 2x10Gb combined in an MLAG, and we tested with no compression vs. compression, and RoCEv2 (NIC Offload). Once the initial testing was done, we flipped to 2x25Gb in an MLAG, and reran all the tests. The customer really wanted to see the benefit of using 25Gb over 10Gb for their workloads. The throughput at the start of the test Is higher than at the end of the test indicating something other than network being the bottleneck and therefore, moving to 25Gb would not give them more performance in the current configuration.

Initial results

We were able to help prove out for the customer that enabling compression had only a small impact for their general-purpose workload, but their SQL workload was impacted more significantly (10-20%). RoCEv2 testing, however, did not look good with the general-purpose workload provided. When we broke testing down by specific block size, the testing for RoCEv2 was great for 4k blocks (25-35%) better. Knowing this, the customer might look to leverage RoCEv2 when they have applications with small block sizes. Testing 10Gb vs. 25Gb, we were able to prove out that both workloads would not benefit from going to 25Gb as the bottleneck was not networking (topping out at just over 8Gb), but instead was CPU bound.

Just one more thing

After providing the initial results to the customer, they wanted more. We agreed to more testing that included testing with Raid1-FTT1 for SQL, smaller numbers of VMs, different thread sizes. Additional testing ran for an additional month. This cluster remains deployed within our ATC, and I am just waiting for that call asking for just one more thing.