The Everpure FlashBlade and why the need for a new design

The original FlashBlade was released in 2016 and was the first of its kind, delivering an all-flash solution for unstructured data, which had long been served by the spinning-disk market.  With the exponential growth of unstructured data, Everpure (formerly Pure Storage) updated the FlashBlade design with a modular approach in 2022 called the FlashBlade//S that allowed compute blades to scale independently from the storage by using their DirectFlash Modules (DFMs) instead of the NAND chips being soldered onto each blade as was done in the first generation of the FlashBlade design. Despite the hardware changes, the heart of the solution (Purity//FB software) still attains phenomenal performance by using a Key-Value database as the metadata engine.  In fact, the latest testing shows that a single FlashBlade//S chassis can support 3.5 trillion objects in about 100 MB of metadata space. 

The FlashBlade//S solution scales to 10 chassis (100 blades) and is well-suited for many AI storage use cases, such as data ingest and model training.  As AI Dataset sizes increase into the petabytes, and the number of GPUs used for training and inferencing grows into the tens of thousands, the FlashBlade//S architecture doesn't scale as efficiently and economically to meet the needs; thus, the FlashBlade//EXA was born in 2025, which expanded the FlashBlade//S architecture by separating the data storage from the metadata operations.

//EXA Architecture

In traditional High Performance Computing and AI environments, storage systems that incorporate parallel filesystems have been dominant due to their performance, but they are also very difficult to install and complex to manage.  With the maturity of parallel NFS (pNFS), we are seeing more vendors offering pNFS solutions because of the similar performance it delivers without all the extra complexity. 

FlashBlade//EXA utilizes pNFS in its new disaggregated storage architecture, pairing one or more FlashBlade//S500 chassis as Metadata Nodes (MN) with commodity rack servers filled with SSDs as Data Nodes (DN). This allows you to scale and size the solution based on your performance and capacity needs.  How does data flow and client connections work in this new design….I'm glad you asked.  When a client initiates a read or write operation, it establishes a parallel NFS (pNFS) connection to the MN.  The MN acts as an "air traffic controller", redirecting the client to the appropriate DNs serving the File System for a direct access connection via the blazing-fast NFSv3 over RDMA protocol.  Meanwhile, the MN(s) and DN(s) are in constant communication behind the scenes, handling file system creation and updating the metadata key-value store to keep track of where the data resides across the DNs. This architecture is purpose-built for high throughput and parallel access, ensuring that neither the metadata operations nor data access becomes a bottleneck.

FlashBlade//EXA data flow

The results of this architecture change for FlashBlade//EXA are a high-performance, scale-out storage solution built for modern data needs. The updated design provides significant parallelism, high throughput, and the flexibility to handle both AI and HPC workloads.  As Metadata requirements change, customers can simply scale the FlashBlade//S cluster from 1 to 10 chassis with each chassis supporting up to 10 blades, while still utilizing a single virtual interface port (VIP) connection that spreads the load across the cluster to utilize all the blades efficiently. As capacity needs change, simply add more DNs (up to 1000) with the SSD capacities and quantities required to meet your needs.  The MNs, DNs and clients are all connected via 400 Gb network switches for low-latency, high-throughput connectivity while limiting the number of cables used to simplify the installation process.

Installation

Historically, Everpure's hardware appliances (FlashArray and FlashBlade) have always been just that, an appliance.  Simply rack the gear, connect the cables, copy the desired software version from a USB drive, and run through the setup wizard.  Within a few hours, the array would be ready to provision storage and allow client connections.  

In the ATC, we've installed numerous FlashArrays and FlashBlades for customer evaluations and can testify that the installation process is straightforward and quick.  The FlashBlade//S (a.k.a. MN) installation was what we were used to.  The recommended software version was installed on the External Fabric Modules (XFMs); we then connected the FlashBlade chassis cables to the XFMs, where the software was pushed to each of the blades and ran through the setup wizard to complete the base install steps and access it across the network.

It's worth noting that any time you open up your ecosystem to use commodity servers in the design, there's going to be new challenges and growing pains around the installation, configuration, and management.  And the responsibilities for securing unauthorized access and out-of-band management falls to the customer as it's no longer a hardened appliance.  This was a new experience for us with Everpure as we went into this with the appliance mentality and forgetting that this design incorporated the SDS characteristics for the installation and ongoing maintenance. Note - while storage appliances typically incorporate all the firmware, drivers, and software updates as part of the upgrade process, those ongoing maintenance steps are separate tasks for the SDS approach and need to be managed by the team(s) responsible for the hardware.  As it relates to management, every OEM's out-of-band management interface is different, some better than others, and requires trial and error to get it right, both on the cables/adapters used and the settings required to make a successful connection to remotely manage the device.  With all that said, the rack servers (a.k.a. DNs) installation was not a simple and quick installation…but that's the beauty of the AIPG - allow WWT to iron out the kinks, prove out the steps required to make things work together, all while reducing time and risk for the entire process.  

The deployment in our lab sandbox consisted of a Linux management VM that runs the FlashBlade//EXA Services Container.  This Services Container provides TFTP & DHCP services, a repository for installation files and scripts, and a Prometheus and Grafana instance for ongoing monitoring of the Data Node's performance.  This is also were maintenance tasks, such as disk replacements, on the DNs are initiated.  While this was only a small 8 DN configuration, we wanted to treat it as if it was 100, 500 or even a 1000 node install to get an idea of what a customer would expect during the installation process.  While we could have simply copied the installation files and software to a USB drive to plug in locally to each server, we used the provided automation scripts and steps for the installation process by having the DNs boot over the network to load the software and configuration files from the management VM. This meant we needed to configure out-of-band networking on the DNs and change the BIOS to allow network booting. Next, we captured the MAC address for the server's onboard NICs to set up DHCP reservations and node names that would be used in the FB//EXA deployment.  Finally, we configured the DHCP options to direct the DNs to the TFTP server running on the Linux VM.  After a few attempts and a couple of tweaks with our management network setup, we were able to start the DN installation. 

Reduced cabling by using 400 Gb networking

The upside of troubleshooting new installations is that you really get to learn the product, how things work under the covers, and to collaborate with the OEMs so they can update their install docs and environment prerequisites to help customers avoid the same challenges in the future.  In our experience, no two environments are the same; they are all configured a little differently and use different switch models and OEMs.  With the base setup and deployment complete, it was time to configure the solution.

At the time of our testing, the Viking VSS2320 servers are the only currently supported server model, as they provide hardware-based redundancy for high availability (HA) by allowing each server controller in the 2RU chassis to connect to all installed SSDs.  In the event of a server failure, the remaining server can take over access to the drives and the data they contain.  In a future software release, the resiliency will be done via software-based erasure coding, which will remove the hardware requirement for HA and allow additional server OEMs and models to be supported.

Configuration

FB//EXA

With the Purity//DN image installed on the DNs, a few tasks remained before we could join them to the MN.  For each DN, we needed to run a command to format the DN's internal storage (local NVMe drives), then another command to run a health check.  Once all the DNs were in a healthy state, the last couple of steps were done via an SSH session to the MN to create the first Node Group and add the DNs to it.  Note - In a large-scale FB//EXA deployment, there may be a need for multiple Node Groups (e.g., different departments or multi-tenancy), and a DN can belong to multiple Node Groups.  

We started with only 6 DNs in the group and later added 2 more, as shown in the image below.  In the current release tested, there is no DN rebalancing of the data as reflected with DNs 9/10 having less consumed data on them.  And in case you are wondering DNs 1/2 needed a firmware update at the time of the Node Group creation and will be used for future customer POCs.

A screenshot of a computer

AI-generated content may be incorrect.
Familiar FlashBlade look and feel

At this point, the system was ready to have a File System created. This step consisted of associating the File System to a single Node Group, specifying the size of the File System, and providing a name - which was all done through a single command.  The only thing left to configure was the protocols enabled for the File System and the rules & policies for who can access the network share.

Clients

On the client side, we used two high-performant servers with GPUs and 2 x 400 Gb network cards running an Ubuntu OS.  There are only a few requirements related to BGP and RoCEv2 networking that need to be configured so we installed the standard FRRouting package on the clients, enabling bgpd and configuring the service.  Note - FlashBlade//EXA utilizes a common layer 3 Border Gateway Protocol (BGP) network designed for performance and efficiency, along with Remote Direct Access Memory (RDMA) that is optimized for high speed and low latency.  The dual 400 Gb Connect-X network ports were then configured with the correct Priority Flow Control and DSCP mapping settings to support RoCEv2. 

Finally, to complete the configuration phase of the install, we installed the Everpure-provided "nfs-client-pure-dkms" Linux package, which optimizes the Linux kernel NFS.

sudo apt install ./nfs-client-pure-dkms_1.0_amd64.deb

Testing

With the File System created on the FB//EXA and the clients configured, we were ready to start the testing. All that was left to do was mount the File System on the Clients using the below mount command that specifies the single MN VIP and File System. This is because the FlashBlade//S internally load balances the connections automatically across all the available blades.

  • sudo mount -t nfs -o vers=4.1,proto=tcp,nconnect=16 <data_vip>:<filesystem> /mnt/nfs

Note – the mount command specifies the file system type of NFS, with options for NFS version 4.1 and nconnect=16 to establish multiple TCP connections to the VIP.

Here's where things got fun. During baseline synthetic testing, FlashBlade//EXA achieved near line-rate performance on a single client with dual 400 Gb ConnectX adapters. 

  • In a 100% read workload, aggregate throughput of the two 400 Gb NICs reached 781 Gb/s (97.65 GB/s), effectively saturating the available 800 Gb/s of network bandwidth on a single client.
  • In a 100% write workload test using 512k block size a single client with two 400 Gb NICs averaged a sequential write throughput of 83 GB/s (77.3 GiB/s).

As we added a second client in the mix with the same hardware specs, latency remained consistently low, and throughput scaled linearly across our tests.

  • 100% Write across 2 x clients each with 2 x 400 Gb/s NICs

 

In the end, we found that client-side networking was the bottleneck in our lab setup.  The FB//EXA did a great job of balancing metadata operations across the blades and spreading read/write operations across the DNs that serviced the file system presented to clients.  Our best guess is that it would take 8-10 clients, each with 2 x 400 Gb NICs, to saturate the network connections to the 8 DNs in our setup.

Power requirements are another important factor to consider. While in an idle state, the solution consumed about ~5-6 kW of power.  During the 100% write workload test using two clients, the FB//EXA solution consumed approximately 8.5 kW during sustained write tests and about 7.2 kW during sustained read tests.

 

Summary

In closing, FlashBlade//EXA is fast and made a strong impression on our AI Proving Ground team. From the disaggregated design to the simple client setup, it's a solid choice for anyone needing serious storage horsepower—especially if you want to spend more time running workloads and less time tinkering. And with FlashBlade//EXA running the same Purity//FB operating system, the learning curve will be quick for those already familiar with FlashBlade's UI.  We're excited to collaborate with our customers as they explore use cases that require FB//EXA-level performance and future enhancements as the product evolves. Our initial impression is that this platform truly delivers on its promises for today's data-driven environments.  

Are you ready to evaluate FB//EXA for your demanding AI and HPC workloads?  Let our AIPG teams help de-risk and accelerate decision-making for your next-generation, high-performance storage needs.

AI Proving Ground in the ATC

WWT's Advanced Technology Center (ATC) is a state-of-the-art facility that allows customers, partners, and employees to explore, test, and validate technology solutions in a collaborative environment. The AI Proving Ground (AIPG) is an initiative to develop, test, and implement artificial intelligence solutions within the ATC. The AIPG enables AI technologies to be explored, validated, and demonstrated in real-world scenarios, allowing organizations to assess the capabilities and potential of AI solutions before deploying them at scale.

Technologies