In this ATC Insight

ATC Insight

The series of tests to be performed on the PURE X70 storage platform was broken into 3 major sections:

  1. Baseline Performance Testing
    Gathered a baseline of performance data by running five VDBench jobs for a 30 minute iteration and then recorded and captured the data.
     
  2. Replication Testing
    Tested replication while a specific VDBench job was running in the background and then recorded and captured the data. 
     
  3. Resiliency Testing
    Lastly, tested the resiliency of the platform by introducing hardware failures while replication was taking place with the same VDBench job was running in the background.

Last impressions and thoughts are listed in the conclusion at the end of this ATC Insight.

Hardware and software consisted of the following components:

  • Cisco UCS 5108 with B200 M4 blades - this was used as source and destination host configuration
  • Cisco 6248 switches - were used for 10GB connectivity to hosts
  • Brocade C620 switches - were used for source FC connectivity
  • Cisco Nexus 5548 switches - used for replication traffic of storage arrays
  • Cisco MDS 9148 switches - were used for destination FC connectivity
  • PURE X70 - pair
  • PURE software version 5.2.6

**Testing on this PURE X70 array was performed in December 2019 and into January 2020.

High level design of the physical environment is depicted below:

High level design of the physical environment

Baseline Performance Testing

The PURE X70 array was filled to 70% capacity at the request of our client.  The array had the same number of Volumes/LUNs carved out and presented for performance testing on each VM in each cluster.  This is a standard used for all VDBench testing.

The command VM was the only VM used for migration and had a total of 6 volumes mapped.  The first volume was filled with 200GB of data before being replicated to the target array.  Once replication was completed, we were able to kick off the jobs that created change data on the other volumes that were replicating.  This gave us a baseline of what replication looked like without performance jobs running during replication.  

Before the client came on-site we ran the performance testing ahead to save time and came back with the following baseline performance results for the PURE X70 array:

PURE X70 75K IOPS, 64K block, 100% Sequential, 100% Write
PURE X70 75K IOPS, 64K block, 100% Sequential, 100% Write
PURE X70 75K IOPS, 64K block, 100% Random, 50% Read/Write
PURE X70 75K IOPS, 64K block, 100% Random, 50% Read/Write
PURE X70 150K IOPS, 64K block, 100% Random, 50% Read/Write
PURE X70 150K IOPS, 64K block, 100% Random, 50% Read/Write
PURE X70 75K IOPS, 8K block, 100% Random, 50% Read/Write
PURE X70 75K IOPS, 8K block, 100% Random, 50% Read/Write
PURE X70 150K IOPS, 8K block, 100% Random, 50% Read/Write
PURE X70 150K IOPS, 8K block, 100% Random, 50% Read/Write

Performance testing resulted in the following findings for each test run:

Performance testing findings

Replication Testing

After the performance testing had been completed we moved into the replication testing portion which held the most weight for our client.  Our customer used these testing results around replication to formulate some thoughts based on some standards they have today in their production data centers around replication.  

A base workload was used for all replication testing, this was a VDBench job running 150K IOPS, 8K block, 100% random, 50% read/write.  This consisted of the 200GB data being replicated first.  Then, once replication was completed, the VDBench job was kicked off for a run time of 30 minutes.  The following results were observed for the PURE X70 array:

PURE X70 array results

Resiliency Testing

Once the replication testing was completed, the final round of testing with our client commenced.  This testing was specifically around resiliency.  

For the resiliency testing, there were a few key elements being tested:

  • First, a controller failure simulation
  • Second, a fiber channel switch going down
  • Finally, a drive being pulled from the array 

Additional Notes:

For the controller failure simulation, we decided to pull the controller and leave it out for a 10 minute time period to better simulate a failure.  For the PURE X70 the failure test for the controller was actually pulling the CT0 controller as it was considered the primary controller.

For the drive being pulled from the array, the first iteration of the drive pull resulted in results that showed no impact due to updated technologies on how the system detects a missing drive.  If the pull and replace was too quick there was no impact (as if the drive never was pulled out).  This lead in a change of testing to have the drive pulled for a few minutes at a time.  

 While resiliency testing was commencing we had a baseline job running that consisted of 150K IOPS, 8K block, 100% random, 50% read/write.  This was also paired with replication being enabled while the test's were performed.  The sequence for each test consisted of the following steps:

  1. Start VDBench job of 150K IOPS, 8K block, 100% random, 50% read/write
  2. Let run for 5 minutes
  3. Enable replication between the source and destination array
  4. Let run for 5 minutes
  5. Perform the decided resiliency test
  6. Wait for array to equalize once test was complete
  7. Stop replication
  8. Stop VDBench job

The above was tested on the PURE X70 array for each of the three resiliency tests.  This testing was the most time consuming testing that we did with our client.  

Here are the testing results of resiliency testing on the PURE X70 array:

Controller Failure

At 1:18pm controller CT0 was removed from the chassis resulting in controller failure. At 1:28pm controller CT0 was reinserted into the chassis.

Array recovered as expected with no loss of service to the workload. IO was impacted for approximately 6 seconds after controller failure. Controller CT1 became primary instantly with nominal impact. After reinserting controller CT0 no impact was noted. This is due to the active/passive architecture of the Pure array. CT0 came back online as the passive controller.

Fail single fiber channel switch

Workload with replication started at 2:40pm. All power was removed from a single brocade switch at 2:42pm. We observed latency being impacted the entire time the switch was down due to saturation of the esx host uplinks resulting in queuing. The Brocade switch recovered at 2:49pm we then observed latency returning to pre-test performance.

Single drive failure

Workload with replication started at 2:07pm. A single drive was failed from the disk array enclosure by removal at 2:12pm causing a disk rebuild with in the array. The disk rebuild lasted approximately 1 minute 30 seconds. Latencies during this time period averaged 4ms read and 4ms write.

Last Impressions and Thoughts

  • PURE does a very good job of making storage simple; doesn't require years of being a storage administrator to understand and manage the solution.
  • Ease of install; doesn't require an engineer to rack the array and cable it, very simple and straightforward.
  • PURE support is excellent at being proactive and quick to respond if needed.
  • Small footprint with large capacity bringing 2078.9 TiB effective capacity in 3RU.
  • PURE1 makes for a nice stop management platform for multiple systems.

Test Tools

Depiction of Grafana being used as front end to VDBench

VDBench

VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network connected storage. The software is known to run on several operating platforms.  It is an open-source tool from Oracle.  To learn more about VDBench you can visit the wiki HERE.

Graphite/Grafana

A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts.  To learn more about this product you can go HERE.

Specific Hardware of PURE X70 for Lab Testing:

  • FA-X70R2-FC-182TB-91/91-EMEZZ 
    Description: Pure Storage FlashArray X70R2-FC-182TB-91/91-EMEZZ
  • FA-X70R2-FC-245TB-91/91-63-EMEZZ 
    Description: FA- X70R2 - FC - 245TB - 91/91-63 - EMEZZ

Supporting Lab Environment for Testing:

  • Cisco UCS 5108 with B200 M4 blades - this was used as source and destination host configuration
  • Cisco 6248 switches - were used for 10GB connectivity to hosts
  • Brocade C620 switches - were used for source FC connectivity
  • Cisco Nexus 5548 switches - used for replication traffic of storage arrays
  • Cisco MDS 9148 switches - were used for destination FC connectivity

Diagram Depiction of the Lab in the ATC:

 

 

Technologies