HCI for Edge Computing

"We can do that" is a phrase that you will hear often at WWT. We recently had an energy company approach WWT with a large ask. They wanted to test 2-node ROBO solutions from four separate vendors, with replication, slow the connectivity, monitor the bandwidth for vCenter and Witness traffic — oh, and they wanted it all, ready to test, in under a month. Our answer was, "We can do that."

This customer has hundreds of small power plants where they currently run 2 node SimpliVity clusters, however, they are looking to improve bandwidth management when used at scale and wanted to see if other solutions might be a better fit. The customer requested that we test the following 2 node solutions: VxRail, Nutanix, HyperFlex and SimpliVity. All solutions had a vCenter server deployed in our Flash Lab environment, which acted as our central data center. Let's have a look at how each one was configured.

VxRail

To get things ready for the VxRail deployment we deployed 2 P470Fs at the 7.0241 code level. The VxRail nodes had 2x10Gb cross connected between each host and then each host had 2x1Gb connected to the top of rack switch. This configuration allows for fast connections (10Gb) for vSAN traffic between the hosts and slower; cheaper, 1Gb connection to the top of rack switching. We also configured the cluster with 1Gb connections from each host to the ToR switching to support remote administration with the through the iDRAC. In addition to the vCenter server, we deployed the Witness Host in our Flash Lab environment. The VMware witness host can support up to 64 two-node clusters or a maximum of 64,000 components.

Deployment for this environment was very easy to do and only took only a few hours to deploy with the VxRail workflow. On top of this deployment, we added Recover Point for Virtual Machines (RP4VM) for replication and orchestrated failover. We used an additional VxRail cluster for the replication target. Once deployed, we were able to monitor vCenter, Witness and Replication traffic for the customer.

HyperFlex

We initially tried to deploy using HyperFlex systems that we had available in our lab (HXAF220C-M5SN), but after some discussion with Cisco it was determined that these hosts were not on the hardware compatibility list to support a 2-node design. Working with Cisco, we were able to quickly have two hosts that were supported for a 2-node design (HXAF220C-M5SX) shipped and racked in the ATC. The partnership with Cisco was critical in getting the correct gear in house to support this effort. Hosts were direct-connected with 2x10Gb between the nodes for storage data and vMotion traffic and 3x1Gb to the ToR. Management, replication and VM traffic used 2x1Gb and remote management (CIMC) used 1x1Gb.

Once the hosts were racked and stacked deployment only took about 3 hours using the Private Intersight Appliance which was previously deployed with the HyperFlex 4.5 bundle. The Private appliance also acts as the Witness node for quorum for the 2. Replication was accomplished using HyperFlex native replication to another HyperFlex cluster within the ATC.

Nutanix

For Nutanix we also tried to use gear that we had in house. First, we tried to deploy on HPE DX nodes and then we tried on SuperMicro NX nodes. Both were unsuccessful due to the hardware compatibility list for two-node deployments. The Nutanix two-node design differs slightly from other solutions. Nutanix does not support direct connecting the hosts together with 10Gb. There are 3x1Gb connections coming from each host going to ToR networking. Management and storage traffic used 2x1Gb and out of band management (iLO) used 1x1Gb.

Again, our partnership with Nutanix was critical. We were able to get the required equipment shipped into the ATC to support this POC in a timely manner. Nutanix shipped us two NX SuperMicro nodes (NX-1175S-G6-4108) to help make this effort a success. Once the new hosts were racked and cabled a witness node was deployed in our Flash Lab environment for quorum. The Nutanix witness can support up to 50 instances of any combination of two node and Metro Availability clusters.

Using the Nutanix Foundation software, the two-node deployment took was completed in just a few hours. Replication was accomplished using the built-in replication technology with the target being another Nutanix cluster within the ATC.

SimpliVity

SimpliVity was deployed on existing equipment in the ATC (HPE Proliant DL380 Gen10). Deployment for the two-node cluster took about a day to accomplish after rack and stack was complete. Connectivity for this environment is 2x10Gb between hosts (direct connect), and then 3x1Gb to ToR. The 10Gb connections between hosts allow for fast storage traffic while the 1Gb ToR connections allow for slower management traffic as well as iLO support for remote management.

Deployment is done with the SimpliVity Deployment Manager. The witness node in SimpliVity is called the "Arbiter" and makes sure there is quorum. For this setup both the deployment manager and the Arbiter were deployed to the same windows 2019 VM running a separate data center host. SimpliVity did not have the same strict hardware requirements that some other vendors had meaning the building block for SimpliVity single and remote node can be the same building block as a data center node. Including the installation and configuration of the Arbiter node, the overall deployment was around 8 hours.

Replication is built in to SimpliVity and was configured to use a single SimpliVity node as the target within our data center. HPE uses "HPE SimpliVity Federation" to join virtual controllers "Omnistack Hosts" together and then you can simply create your backup jobs via the SimpliVity plugin in vCenter. This plugin also controls all the actions that you can do with SimpliVity creating datastores, remove hosts from a federation, backup policies, etc., essentially giving customers a single pane-of-glass.

Traffic impairment

To try to simulate a T1 to these remote sites, we took a port from our core switching and we put that into a Spirent appliance. The other port on the Spirent appliance was connected to our remote site FEX. Any traffic coming in/out of the remote site was impaired to 100mb.

Reporting

Once the link was slowed, we configured replication for each different solution and replicated from the remote sites back to the data center and monitored the traffic for both replication and the traffic for management/witness.

Metrics were pulled from vCenter using the performance charts as well as looking at the switch ports themselves using a plugin for Grafana.

Below is an example of the results from one of the solutions:

This level of monitoring allowed the customer to compare what they were seeing with their SimpliVity solution and compare it directly with the other three contenders.

Outcome

Currently the customer is evaluating if they are going to stay with SimpliVity; they really like the replication technology that is built in, but again, they are looking to optimize bandwidth consumption, especially at scale. The customer was impressed with how quickly we were able to build out 4 separate 2-node clusters and make it easy for them to evaluate manageability and utilization using a lab that mimics their production environment.