NetApp SDS Testing In The ATC
We had a customer recently ask us to do some lab testing via a Proof of Concept (or POC) around a NetApp Solidfire solution. The customer was interested in some specific testing around Performance, Reliability, Availability, and Serviceability (or RAS).
In This Insight
We worked inside the Advanced Technology Center (or ATC) directly with NetApp architects who had a chance to bless the setup and help with the baseline performance testing. Then, as a trusted advisor to our customer, we (ATC Lab Services team) walked the array through the paces of testing in the Execution phase of the POC.
The series of tests that were performed on the NetApp Solidfire storage platform was broken into 3 major sections:
- Baseline Performance Testing
Gathered a baseline of performance data by running five VDBench jobs for a 30-minute iteration and then recorded and captured the data.
- Functionality Testing
Gathered required evidence on management form and functionality for the solution.
- Resiliency Testing
Lastly, tested the resiliency of the platform by introducing hardware failures while replication was taking place with the same VDBench job was running in the background.
Last impressions and thoughts are listed in the conclusion at the end of this ATC Insight.
Hardware and software consisted of the following components:
- 4x Dell Poweredge R640's - were used for the compute resource
- Cisco 9K-C93180 switches - for 25GB iSCSI storage traffic
- 8x NetApp H610S-1 storage nodes
- 1x NetApp OVA deployed management node
**Testing on the SDS solution was performed in October 2020 and into November 2020.
High level design of the physical environment is depicted below:
The NetApp Solidfire solution was filled to 50% capacity at the request of the client. The array had the same number of Volumes/LUNs carved out and presented for performance testing on each VM in each cluster. This is a standard used for all VDBench testing.
The client provided us with the performance testing requirements of which is documented below:
Max IOPs, 4K Block, 70% Read, 30% Write
Max IOPs, 8K Block, 30% Read, 70% Write
|Average Latency||4.68 ms|
|Average IOP’s||436.48 K|
|Average Throughput||3.576 GB/s|
Max IOPs, 8K Block, 90% Read, 10% Write
|Average Latency||4.95 ms|
|Average IOP’s||414.12 K|
|Average Throughput||3.533 GB/s|
Max IOPs, 32K Block, 70% Read, 30% Write
Max IOPs, 1MB Block, 70% Read, 30% Write
The client had asked us to go through different aspects of functionality for each solution. This was a combination of QoS testing, performance monitoring, efficiency reporting, snapshot capabilities, and overall general management of the solution. For this ATC Insight, we focused on the QoS testing (if capable), performance monitoring, and snapshot capabilities. If there is further interest in the other tests we performed, please feel to reach out to the author of this ATC Insight.
This test was completed by running two separate VDBench jobs. For this instance, we had the QoS policy set to 200K IOPs so the first job was kicked off at 25K IOPs with a 4K block at 50% read and 50% write. The job was started for a period of time before kicking off the second job running at 200K IOPs with a 4K block at 50% read and 50% write; this would put the traffic over the set 200K IOP threshold and should engage the QoS policy to take effect.
Live Performance Monitoring
The client required that the solution would alert to an abnormality ion health when an instance would happen, for this, we ran a VDBench job workload of 25K IOPs, 4K block, 50% read and 50% write. After the workload ran for a period of time an SFP was pulled from one of the nodes. While we see the impact in the VDBench output there was no notification from the NetApp interface, though the event could be seen in the logs.
This test revolved around a future feature request from the client, teams were starting to look at the benefits of snapshots in their environment and the client wanted to see what functionality could be garnished from each solution. The ask from the client was as follows:
- Manually create a snapshot
- Create a snap of the previous snapshot
- Present snapshot to host and actively use
- Create a consistency group snapshot
- Create snapshot schedule with retention rules
In this test we were not able to take a snapshot of a snapshot without using CLI. The request from the client was for everything to be done through the GUI. Also if looking to have different retention periods then a unique schedule would need to be created for every retention period as the only time a retention period could be created is with the schedule creation.
The final round of testing was based on hardware resiliency. For each of the tests, a baseline VDBench job would run 50K IOPs, 4K block, 50% read and 50% write. The tests consisted of the following:
- Power leg pull from one node
- Connectivity leg pull from one node
- Remove a node from the cluster for a period of 5-minutes
- Drive removal from one node, if no impact, pull the drive from subsequent node until impact
**Found this a bit odd that a power pull on only one leg would cause an impact, NetApp agreed and took the action to look into it and document the findings.
VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network-connected storage. The software is known to run on several operating platforms. It is an open-source tool from Oracle. Visit VDBench Wiki for more information.
A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts. Visit Grafana for more information.
Last Impressions and Thoughts
- NetApp does a great job of making a straight forward solution with easy setup and management.
- The improvements being made to the software layer will help garner an advantage in the player market.
- Would like to see Fibre Channel as a connectivity option.