Object storage isn't anything new. In fact, many people are surprised when they learn that initial development of commercial object storage solutions started in the late 1990s. The technology has already lived through many lifecycle iterations but we need to credit Amazon Web Services (AWS) for the modern popularization of the storage platform via their implementation: S3.
Over the last decade, the rate of object storage adoption has soared, initially in the public cloud service provider space for the most part. In just in the last 18 months though, we have seen a stupendous increase in on-premise object storage proofs of concept in the ATC, as well as customer purchase of on-premise object storage platforms.
Undeniably, the ever broader gamut of use cases for object storage is a driver for its adoption. In the past, the use of object storage in the data center was largely relegated to data protection — and with good reason. Object storage platforms store data in a very cost effective way compared to traditional file and block storage and their performance for high throughput sequential workloads is unparalleled.
More recently, we've seen application vendors adopt object storage as a first tenant of their products to realize the benefits mentioned above. Such products as Splunk, Hadoop and AI Frameworks (to name a few) have, in their latest iteration, integrated native object storage support.
In turn, this has motivated object storage vendors to start designing even higher performing solutions that leverage flash in combination with high throughput SATA media or even all-flash object storage appliances.
StorageGRID performs in five areas
So what makes NetApp's StorageGRID different from other object storage solution out there? Anyone familiar with NetApp's object storage platform would point to 5 specific areas where StorageGRID establishes leadership: information lifecycle management, layered erasure coding, cloud integration and federation, software-defined object storage and performance. Let's dive into each of these topics.
Information lifecycle management
If you ask anybody familiar with StorageGRID what sets them apart from other platform, the first thing that will come up will be the policy-based information lifecycle management engine. Simply put, this engine allows you to control how the data will live throughout its life on the grid.
A grid manager could define a policy wherein data written to the grid would automatically get copied to every one of five different sites upon first write for the first six months of its life (to ensure local access performance), then after the initial six months, two copies be kept in a single site for the following 18 months (the site where data mining is performed). After the data hits two years of age, a single copy of the data gets erasure encoded across the five sites (for data durability and cost efficient storage). By defining such a policy, the user would never need to touch the data for that data to trickle through the grid and self optimize based on the defined policies.
This is just an example based on life of the data, but much more complex rules can be created combining multiple criteria. These criteria can even include custom metadata fields attached to the data being stored on the grid. This isn't exclusive to StorageGRID, but the granularity of the policy definition, the ability to simulate the effect of a policy change and being able to apply that policy on an existing bucket containing data without having to migrate the data to another bucket with NetApp's solution is unmatched.
Layered erasure coding
The StorageGRID platform offers the highest possible level of data durability and availability by using multiple mechanisms to ensure data integrity and providing the ability to do geographically dispersed erasure coding. First off, when designing the platform, NetApp chose to use local erasure coding or DDP to protect the data before the StorageGRID software is even involved in the data protection scheme.
Interestingly, this will greatly reduce the strain on the grid when a media fails and ensure a much quicker rebuild than other protection schemes — no matter what. (Having to do a rebuild of a devices that contains data which is geographically erasure coded will require large amounts of bandwidth and could take an enormous amount of time.)
The StorageGRID software provides the ability to configure multiple erasure coding schemes in a single grid to optimize cost, reliability and performance all at once. Once the data is stored, StorageGRID has both in-flight and at-rest data verification processes to ensure the integrity of the data.
Cloud integration and federation
Another differentiator between StorageGRID and many other object storage solutions available today is its ability to federate with public cloud service provider object services. What we mean by that is that data can actively flow between an on-premise StorageGRID deployment and AWS S3, Glacier or Azure Blob.
Those public cloud services can be defined as a storage tier within the grid and employed as a target for information lifecycle management policies. As part of the integration with public cloud service providers, StorageGRID also supports AWS SNS or Simple Notification System and the streaming of metadata directly into ElasticSearch for indexing.
A capability like SNS can enable the launching of a lambda function when new data is being written to a bucket, which could in turn launch an ETL job to process a new batch of logs, analyze a picture with AWS image rekognition or even transcribe a recording that was stored in a bucket and attach the transcription as metadata.
Software-defined storage option
In a world where storage appliances are still king because of performance considerations and tight coupling of hardware and software to reduce unpredictability and increase availability, it's great to see that NetApp supports both appliance and software defined deployment in the same grid.
Now by no means is this feature exclusive to NetApp and StorageGRID. Some competitive solutions out there do provide the same level of flexibility, but given the limited number of vendors providing this capability, we felt like this was worth a shout out.
Considering that object storage solutions focus on delivering very high storage density, most appliances have a very high minimum deployment size. NetApp allows customers to deploy StorageGRID as an .OVA file on existing hardware, which is great.
Performance: everybody's favorite topic. Testing the performance of object storage isn't an easy thing. In the ATC, we execute performance testing of object storage solutions using COSbench.
Recently, we have had the opportunity to do a very large object storage POC with four different vendors that included NetApp StorageGRID with a full cabinet solution for each vendor (that was our customer's benchmark to evaluate performance, cost and density together). To be as thorough as possible, we evaluated multiple object size (100KB, 10MB and 1GB), each with both 100 percent PUTS and 100 percent GETS operations (read vs write IO in the block and file world) and an increasing scale of workers (from 12 to 960 workers).
We used eight of NetApp's SG6060 appliance to do those tests and were very pleasantly surprised with the results. The appliance uses 2x800GB SSD drives to cache metadata, which provided some substantial benefits to TTFB of GETS operations and a very consistent and limited jitter for response time. We used 4x25gbps connectivity for each of the eight appliances included in the test and managed to achieve upwards of 56GB (yes, gigabytes!) per second on large reads. The platform maxed out at 40GB per second in writes.
With increasing focus on object and performance, NetApp also provided us with four of their new all-flash object storage appliances, the SGF6024. This appliance is already in the ATC and can be leveraged for POCs/evaluations by customers today.