?
Data Center Primary Storage
6 minute read

SILK SDS Testing In The ATC

We had a customer recently ask us to do some lab testing via a Proof of Concept (or POC) around the Silk solution. The customer was interested in some specific testing around Performance, Reliability, Availability, and Serviceability (or RAS).

In This Insight

copy link

Summary

We worked inside the Advanced Technology Center (or ATC) directly with Silk architects who had a chance to bless the setup and help with the baseline performance testing.  Then, as a trusted advisor to our customer, we (the ATC Lab Services team) walked the array through the paces of testing in the Execution Phase of the POC.

copy link

ATC Insight

The series of tests that were performed on the Silk storage platform was broken into 3 major sections:

  1. Baseline Performance Testing
    Gathered a baseline of performance data by running five VDBench jobs for a 30-minute iteration and then recorded and captured the data.
     
  2. Functionality Testing
    Gathered required evidence on management form and functionality for the solution.
     
  3. Resiliency Testing
    Lastly, tested the resiliency of the platform by introducing hardware failures while replication was taking place with the same VDBench job was running in the background.

Last impressions and thoughts are listed in the conclusion at the end of this ATC Insight.

Hardware and software consisted of the following components:

  • 4x Dell Poweredge R640's - were used for the compute resource
  • Cisco 9148 switches - for 16GB FC storage traffic
  • 5x Silk K8000 compute/storage nodes
  • 1x Silk management node

**Testing on the SDS solution was performed in October 2020 and into November 2020.

High level design of the physical environment is depicted below:

 

Performance Testing

 

The Silk solution was filled to 50% capacity at the request of the client.  The array had the same number of Volumes/LUNs carved out and presented for performance testing on each VM in each cluster.  This is a standard used for all VDBench testing.

The client provided us with the performance testing requirements of which is documented below:

Max IOPs, 4K Block, 70% Read, 30% Write

 

Average Latency

2.36 ms

Average IOP’s

950.61 K

Average Throughput

3.89 GB/s

Silk Max IOPs, 4K block, 70% Read, 30% Write

 

Max IOPs, 8K Block, 30% Read, 70% Write

 

Average Latency4.04 ms
Average IOP’s509.16 K
Average Throughput4.17 GB/s
Silk Max IOPs, 8K block, 30% Read, 70% Write

 

Max IOPs, 8K Block, 90% Read, 10% Write

 

Average Latency2.74 ms
Average IOP’s774.75 K
Average Throughput6.35 GB/s
Silk Max IOPs, 8K block, 90% Read, 10% Write

 

Max IOPs, 32K Block, 70% Read, 30% Write

 

Average Latency

5.04 ms

Average IOP’s

406.18 K

Average Throughput

13.31 GB/s

Silk Max IOPs, 32K block, 70% Read, 30% Write

 

Max IOPs, 1MB Block, 70% Read, 30% Write

 

Average Latency

379.52 ms

Average IOP’s

5.44 K

Average Throughput

5.70 GB/s

Silk Max IOPs, 1MB block, 70% Read, 30% Write

 

Functionality Testing

 

The client had asked us to go through different aspects of functionality for each solution.  This was a combination of QoS testing, performance monitoring, efficiency reporting, snapshot capabilities, and overall general management of the solution.  For this ATC Insight, we focused on the QoS testing (if capable), performance monitoring, and snapshot capabilities.  If there is further interest in the other tests we performed, please feel to reach out to the author of this ATC Insight.

 

Live Performance Monitoring

 

The client required that the solution would alert to an abnormality ion health when an instance would happen, for this, we ran a VDBench job workload of 25K IOPs, 4K block, 50% read and 50% write.  After the workload ran for a period of time a FC port was off-lined from the switch layer, the resulting impact that we saw in VDBench was also seen from the Silk management interface.

shows the impact spike from turning a FC port offline on the switch for one of the C-Nodes
shows a matching impact spike from the Silk management interface

 

Snapshot Capabilities

 

This test revolved around a future feature request from the client, teams were starting to look at the benefits of snapshots in their environment and the client wanted to see what functionality could be garnished from each solution.  The ask from the client was as follows:

  1. Manually create a snapshot
  2. Create a snap of the previous snapshot
  3. Present snapshot to host and actively use
  4. Create a consistency group snapshot
  5. Create snapshot schedule with retention rules

 

shows where to go to create a VG snapshot and what policy to use
right click on Snap_Test and selecting View allows us to mount snapshot to host group
right clicking on Snap_View allows us the option to create another snapshot
creating the view for the 2nd snapshot off the first snapshot
shows I was not able to create the 2nd snapshot view for presenting to the host group
shows doing a restore gets around the 2nd snapshot issue we ran into above
shows the original datastore and the restored datastore
shows bringing the snapshot volume online and available to the host for use
shows I was able to manipulate the directory by creating a  new file
shows the snap group consisting of two volumes which would consist of a consistency group when taking a snapshot of the Snap_VG
creating the snapshot of the Snap_VG
shows the steps to create a retention policy
shows the options when creating a retention policy
shows where to go to create the schedule for snapshots
shows the options for the schedule and which retention policy and which volume group

 

Resiliency Testing

 

The final round of testing was based on hardware resiliency.  For each of the tests, a baseline VDBench job would run 50K IOPs, 4K block, 50% read and 50% write.  The tests consisted of the following:

  • Power leg pull from one node
  • Connectivity leg pull from one node
  • Drive removal from one node, if no impact, pull the drive from subsequent node until impact
Impact from pulling a single leg of power from a node
Impact from pulling a leg of connectivity
Impact from pulling two drives, no impact from single drive pull

copy link

Test Tools

VDBench

VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network-connected storage. The software is known to run on several operating platforms.  It is an open-source tool from Oracle. Visit VDBench Wiki for more information.

Graphite/Grafana

A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts. Visit Grafana for more information.

copy link

Last Impressions and Thoughts

  • This was the first time the ATC was able to vet out the Silk solution and I must say I was very impressed.
  • The interface was very intuitive to the point I didn't need help from Silk to understand any of the management functionality.
  • Support handles all upgrades which is a nice added benefit for peace of mind.
  • There are some feature developments still happening which will make this a great competitor.
  • The addition of Silk Clarity is an added bonus for even more insight into the solution.