Exploring the Pure Storage Pure as-a-Service (PaaS) Offering

Update 6/9/22: Evergreen//One is the new name for Pure as-a-Service (PaaS). The below content is still accurate, but look for a new article coming soon to wwt.com comparing Evergreen//One, Evergreen//Forever (formerly known as Evergreen Gold) and the new offering Evergreen//Flex.

We are hearing from our customers that the direction they've been given (from the top down) is to move to a cloud-like consumption model, even though they may not be fully ready to move everything to the public cloud. Pure Storage has created an OPEX, pay-as-you-go, offering to help bridge the gap in this space that offers flexibility, simplicity, reduced risk and can provide an on-ramp to the public cloud. In this article, we will cover some of the details of the Pure as-a-Service (PaaS) offering. Note: Not to be confused with Platform as a Service.

Let's take a moment to showcase how Pure Storage can help their customers shift a true cloud-like consumption model that can be hosted on-premises, in a co-location or in the public cloud.

PaaS details

Pure as-a-Service (PaaS) is Pure's solution for consumption as-a-service. PaaS is designed to help accelerate customer innovation by tightly aligning IT spend (in storage) with the needs of the business, especially for new emerging applications. Pure as-a-Service shortens time-to-value, eliminates capital commitments, and delivers the peace of mind of a private cloud with the elasticity and flexibility of a public cloud. With Pure as-a-Service, customers get enterprise-grade features and availability at as much as half the cost of the public cloud. PaaS is also built on the Evergreen Storage foundation, meaning there are no priced-in storage re-buys, lease financing, or data migrations. Unlike other on-premises "storage utilities" which wrap long term leases around identified storage assets, Pure as-a-Service is purpose-built for compliance with the new IASB 2019 rules, delivering a service based solely on customer SLAs (i.e., no identified assets or buy-outs) and available for terms as short as 12 months.

PaaS Data Services Catalog - see the different *Services* types along with their *Minimum Reserve Commitment*

How is PaaS billed?

Pure as-a-Service is always metered and billed against effective capacity utilization or "Effective Used" (actual host-written/pre-data reduction as opposed to physical/usable in CAPEX sizing) except for Backup Target use cases which may be measured post-data reduction.

Overall there are three basic rates for storage services:

Reserve - (billed Monthly, Quarterly or Annually) - Prepaid commit price and minimum
On-demand (billed as used quarterly in arrears) - 1.5x over Reserve price
Compression, Deduplication, Encryption "CDE" - (Only billed for environments where the data reduction is < 2:1 for FlashArray workloads) - 2.5x over the Reserve price

If a customer notices that they are continuing to peak into the On-Demand pricing over the Reserve commit they can raise the Reserve capacity at the discounted rate at any time. Lowering the committed reserve can only be done at the end of the term.

Pure as-a-Service includes Pure's Evergreen architecture, where Pure will continue to add all features, software, and controllers non-disruptively without charging the customers for those upgrades. Pure as-a-Service is truly an OPEX consumption model and not a form of a lease.

Additional billing information

Billed in 1 MiB increments for Host Data written
Clones are free - Note: most cloud providers treat clones as full copies
Snapshots only count the space taking up the differentials
Thin Provisioning - Thin provisioning works since Pure meters effective (pre-data reduction) against blocks written and not the provisioned capacity of the volume/filesystem on the array
Zeroes are Omitted - zeroes/whitespace/unmap patterns that are larger than 2MiB do not count against effective capacity
Telemetry data is logged and gathered using Pure's cloud-based analytics tool - Pure1
A 25% headroom is included in all Pure as-a-Service sizings to allow for On-Demand bursting.
Site Definition - There is typically one effective capacity (one bill) across all similar array types per site.

Are you moving to the cloud?

Pure as-a-Service is a Unified Subscription model which allows you to extend your Reserve Capacity commit into the cloud by utilizing a Pure Cloud Block Store (CBS) instance in AWS or Azure. With your single site license, you can start your journey to the cloud for production and/or disaster recovery workloads without the need for maintaining additional physical data centers.

Example:

A customer has a 100TiB Reserve commit and starts with 100TiB of Effective data on-premises. They decide to spin up a CBS instance and moves 20TiB of data into that cloud instance so now 80TiB will be on-premises and 20TiB will be in the CBS instance. The end result is that they are still within their 100TiB Reserve. Be advised that during the migration, data could potentially exist in both places, so some On-Demand capacity will be used during this time until data is de-allocated from the on-premises side.

Furthermore, if the customer goes all into the public cloud, it will still occur under that same license, so they can move the full 100TiB into the public cloud and will still be aligned with their initial 100TiB Reserve commit. In that case the on-premises hardware would be decommissioned and returned to Pure Storage.

For more information around CBS, check out these on-demand labs.

Is PaaS right for me?

PaaS won't be right for everyone, depending on your business model and storage needs, a CAPEX purchase might be better than OPEX Subscription. Either way, Pure has you covered with either an Evergreen Purchase model or a Pure as-a-Service Subscription.

Technical details

The PaaS offering is based on a combination of Live data in Volumes, Snapshots, and Replication capacity - in aggregate this is reported as Effective used capacity. This section will define what exactly Effective used capacity is and how it's measured.

The first thing to note about Effective used is that its data measured by the FlashArray. This may result in a discrepancy between what the FlashArray reports and what the Host reports it has stored on the FlashArray. One of the reasons for this can be when data is deleted by certain Host applications, the data remains on the FlashArray and still counts toward Effective used in Pure as-a-Service. In order to maintain a closer relationship between the measurements of the Host and the FlashArray, commands such as TRIM or UNMAP should be regularly run.

Volume data is measured as the amount of Effective data stored on the array. The easiest way to think about this is it's the size of the data before deduplication and compression. So if a volume is taking up 100 MiB on the array, and you get a 4:1 overall reduction, then the Effective used of that volume would be 400 MiB. For Pure as-a-Service, the FlashArray measures Host Written data at a granularity of 1 MiB (1,048,576 B). This means that any individual 1MiB chunk that is partially or completely written will cause the effective used capacity to be incremented by 1MiB.

Next is the question of snapshots. In PaaS, snapshots are measured as the difference between the data in the volume at the time of the snapshot versus the data in the volume as it exists today. As an example, if you have a single 10 MiB volume on a FlashArray, and you take two snapshots, at that time your bill will be for 10 MiB, even though 30 MiB of logical data is represented on the array (10 MiB in the volume and 20 MiB in the snapshots).

In the below figure, we see this example with the volume's logical space represented by the top blue bar and the snapshots represented by the purple bars. What we're looking for is unique data at any particular 1 MiB column of data. So in column 0, the A in the Volume is the first unique "Chunk" of data, then since the two snapshots below it also represent the same data, they don't get counted as Unique data, and thus they are not charged for.

If on that same volume, you then change 1 MiB of data, the volume itself is still 10 MiB, but there is now a difference in the amount of space the first snapshot is referencing; the second snapshot, however, is no different from the first snapshot. In the below figure, if you start from the top, A' is the first unique 1 MiB chunk of data. Then you have the first snapshot, Snap1, that represents A - since A is different than A', this is counted as 1 MiB of unique data. Then Snap2 also represents A, so it's not unique. The bill, in this case, will be for 11 MiB, 10 MiB from the volume, and 1 MiB from the snapshots.

If you then added 1 MiB of data to a new logical address, the bill would be for 12 MiB, as shown in the below example.

One thing to note is that any data shared between volumes or snapshots of different volumes will still be counted towards Effective Usable capacity. Let's say that we have a new volume, VolumeZ, with 100MiB of data, and take a snapshot. To keep things simple, we'll work with the whole volume at a time, rather than with blocks as above. So in this case, the Effective Usable capacity is 100MiB, since the snapshot has all the same data as VolZ.

Next, we want to make a clone. Regardless of whether we make a copy, or use XCopy or any other method, the Effective Used measurement remains the same. Let's call it VolZclone, and we want to take a snapshot here as well. In this case, the Effective used capacity is now 200MiB, since we've increased the amount of space the Volumes take, but since the snapshots both store the same data as the Volumes, those don't impact the Effective Used measurement.

Now, what happens if we start changing the data? Let's overwrite both VolZ and VolZclone with entirely new data, X and Y, respectively.

In this case, the Effective used capacity has increased to 400MiB. While the snapshots share the same space with each other, each individual snapshot is entirely different from the data in its respective volume. Therefore both snapshots will increase the amount of Effective used capacity on the array.

In the case of a Replication target, everything works largely the same, but there is no volume object on the array. In this case, the most recent snapshot will consist entirely of unique data and therefore count towards Effective used capacity, and any subsequent snapshots will be calculated as a change from the previous snapshot as before.

Finally, features such as Thin Provisioning and zero-detection do not count towards your Effective used capacity. Note that for zero-detection, the size of unmaps and zero write patterns must be larger than 2MiB, otherwise it gets counted as Dedupe and will count towards Effective used capacity.

Why can't I just multiply Used Capacity x Data Reduction to get this number?

One of the reasons why Used * DR doesn't work is due to how we handle Snapshots on the array. Snapshots do not impact the Data Reduction number since there's no good way to represent this - snapshots are either over-represented and inflate the data reduction number or under-represented which makes the actual data reduction number lower than what it should be.

So Used * DR can sometimes get you close to what the Effective Used number is, but only if the Data Reduction rate of the snapshots on your array is in-line with the rest of the data. However, this number can vary greatly if the snapshots have more or less shared data with the rest of the array.

PaaS FAQ

For more information, check out Pure's PaaS FAQ for up to date information regarding technical components, sizing considerations and answers to commonly asked questions.