In this ATC Insight

ATC Insight

The series of tests to be performed on the Infinidat 6303 platform was broken into three major sections:

  1. Baseline Performance Testing
    Gathered a baseline of performance data by running five VDBench jobs for a 30 minute iteration and then recorded and captured the data.
     
  2. Replication Testing
    Tested replication while a specific VDBench job was running in the background and then recorded and captured the data. 
     
  3. Resiliency Testing
    Lastly, tested the resiliency of the platform by introducing hardware failures while replication was taking place with the same VDBench job was running in the background.

Last impressions and thoughts are listed in the conclusion at the end of this ATC Insight.

Hardware and software consisted of the following components:

  • Cisco UCS 5108 with B200 M4 blades - this was used as source and destination host configuration
  • Cisco 6248 switches - were used for 10GB connectivity to hosts
  • Brocade C620 switches - were used for source FC connectivity
  • Cisco Nexus 5548 switches - used for replication traffic of storage arrays
  • Cisco MDS 9148 switches - were used for destination FC connectivity
  • Infinidat 6303 - pair

**Testing on this Infinidat 6303 was performed in December 2019 and into January 2020.

High level design of the physical environment is depicted below:

 

High level design of the physical environment

Baseline Performance Testing

The Infinidat 6303 array was filled to 70% capacity at the request of our client.  The array had the same number of Volumes/LUNs carved out and presented for performance testing on each VM in each cluster.  This is a standard used for all VDBench testing.

The command VM was the only VM used for migration and had a total of 6 volumes mapped.  The first volume was filled with 200GB of data before being replicated to the target array.  Once replication was completed, we were able to kick off the jobs that created change data on the other volumes that were replicating.  This gave us a baseline of what replication looked like without performance jobs running during replication.  

Before the client came on-site we ran the performance testing ahead to save time and came back with the following baseline performance results for the Infinidat 6303 Array:

Infinidat 75K IOPS, 64K block, 100% Sequential, 100% Write
Infinidat 75K IOPS, 64K block, 100% Sequential, 100% Write
Infinidat 75K IOPS, 64K block, 100% Random, 50% Read/Write
Infinidat 75K IOPS, 64K block, 100% Random, 50% Read/Write
Infinidat 150K IOPS, 64K block, 100% Random, 50% Read/Write
Infinidat 150K IOPS, 64K block, 100% Random, 50% Read/Write
Infinidat 75K IOPS, 8K block, 100% Random, 50% Read/Write
Infinidat 75K IOPS, 8K block, 100% Random, 50% Read/Write
150K IOPS, 8K block, 100% Random, 50% Read/Write
150K IOPS, 8K block, 100% Random, 50% Read/Write

Performance testing resulted in the following findings for each test run:

Performance testing findings

Replication Testing


After the performance testing had been completed we moved into the replication testing portion which held the most weight for our client.  Our customer used these testing results around replication to formulate some thoughts based on some standards they have today in their production data centers around replication.  

A base workload was used for all replication testing, this was a VDBench job running 150K IOPS, 8K block, 100% random, 50% read/write.  This consisted of the 200GB data being replicated first.  Then, once replication was completed, the VDBench job was kicked off for a run time of 30 minutes.  The following results were observed for the Infinidat 6303 Array

Infinidat 6303 Array results

 

Resiliency Testing
 

Once the replication testing was completed, the final round of testing with our client commenced.  This testing was specifically around resiliency.  

For the resiliency testing, there were a few key elements being tested:

  • First, a controller failure simulation
  • Second, a fiber channel switch going down
  • Finally, a drive being pulled from the array 

Additional Notes:

For the controller failure simulation, we decided to pull the controller and leave it out for a 10 minute time period to better simulate a failure.  For the Infinidat 6303 the failure test for the controller was actually pulling one of the three controllers power, being the controllers are each a dedicated server.

For the drive being pulled from the array, the first iteration of the drive pull resulted in results that showed no impact due to updated technologies on how the system detects a missing drive.  If the pull and replace was too quick there was no impact (as if the drive never was pulled out).  This lead in a change of testing to have the drive pulled for a few minutes at a time.  

 While resiliency testing was commencing we had a baseline job running that consisted of 150K IOPS, 8K block, 100% random, 50% read/write.  This was also paired with replication being enabled while the test's were performed.  The sequence for each test consisted of the following steps:

  1. Start VDBench job of 150K IOPS, 8K block, 100% random, 50% read/write
  2. Let run for 5 minutes
  3. Enable replication between the source and destination array
  4. Let run for 5 minutes
  5. Perform the decided resiliency test
  6. Wait for array to equalize once test was complete
  7. Stop replication
  8. Stop VDBench job

The above was tested on the Infinidat 6303 Array for each of the three resiliency tests.  This testing was the most time consuming testing that we did with our client.  

Here are the testing results of resiliency testing on the Infinidat 6303 Array:

Controller Failure

At 1:01pm power to a single controller was removed resulting in controller failure. At 1:10pm power was restored to the controller. At 1:28pm the controller showed active in the array element manager. At approximately 1:32pm the workload returned to baseline.

Array recovered as expected with no loss of service to the workload. IO was impacted for approximately 10 seconds after controller failure. Latency read averaged at 9ms, latency write at 8ms. Normal baseline performance resumed after 35 mins.

Fail single fiber channel switch

Workload with replication started at 3pm. All power was removed from a single brocade switch at 3:03pm. We observed at 5 second nominal impact to IOPS. The Brocade switch recovered at 3:05pm. 

Single drive failure

Workload with replication started at 4:19pm. A single drive was failed from the disk array enclosure by removal at 4:23pm causing a disk rebuild with in the array. The disk rebuild lasted approximately a hour ending at 5:20pm. Performance during the rebuild averaged read latencies of 1.37ms and write latencies of 1.63ms.

 

Last Impressions and Thoughts

 

  • Infinidat does a very nice job of packaging their solution.  The solution comes in a cabinet/rack all ready to operate with exceptions of plugging into power and establishing on-premise connectivity for management and data access.  This saves a considerable amount of time in the deployment phase.
  • Infinidat uses spinning disk with flash cache on the nodes to accomplish what some all-flash arrays are doing.
  • The front LCD panel on the cabinet/rack is a nice touch for quick performance information metrics.
  • If more than one node/server fails, the system will completely shutdown to protect the data.
  • Infinidat Support was quick to respond and easy to work with.
  • Infinidat provides the data capacity raw from day one, meaning if a PB of space is purchased then that's what is available on day one.  There is no deduplication or compression factored in the space.
  • Infinidat management interface is intuitive and straight forward with a great deal of information in regards to connectivity from host to array.

Test Tools

Depiction of Grafana being used as front end to VDBench

VDBench

VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network connected storage. The software is known to run on several operating platforms.  It is an open-source tool from Oracle.  To learn more about VDBench you can visit the wiki HERE.

Graphite/Grafana

A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts.  To learn more about this product you can go HERE.

Specific Hardware of Infinidat 6303 for Lab Testing:

  • F6303-COD INFINIDAT Software 500 Description: Base Capacity on Demand ("CoD") per TB (of 1.037PB physical capacity, estimated 2.074 Effective Capacity), with 3 Year Support including Technical Advisor (TA) program and InfiniVerse, with up to 1 year of historical data
  • F6303-COD INFINIDAT Software 500 Description: Base Capacity on Demand ("CoD") per TB (of 1.037PB physical capacity, estimated 2.074 Effective Capacity), with 3 Year Support including Technical Advisor (TA) program and InfiniVerse, with up to 1 year of historical data

Supporting Lab Environment for Testing:

  • Cisco UCS 5108 with B200 M4 blades - this was used as source and destination host configuration
  • Cisco 6248 switches - were used for 10GB connectivity to hosts
  • Brocade C620 switches - were used for source FC connectivity
  • Cisco Nexus 5548 switches - used for replication traffic of storage arrays
  • Cisco MDS 9148 switches - were used for destination FC connectivity

Diagram Depiction of the Lab in the ATC:

 

 

Technologies