Storage Class Memory: The Way of the Future

I tend to wax poetic about how we got to where we are and where history shows we will be going, which is why the title, "the way of the future," made sense to me. This article deals with a topic that falls into that category: storage class memory (SCM). That's because I think when we look back at today ten years from now that we will consider SCM to be a milestone in the evolution of storage technology.

It will be remembered in the same light as the introduction of "removable drives," the creating of Logical Unit Numbers (LUNs) from pools of disks for increased performance, Fiber Channel Protocol (FCP) connected storage devices to eliminate storage islands and of late, the introduction of NAND Enterprise Flash Drives (EFD) to get an exponential jump in performance.

Like most things, this large upward shift for data storage is enabled by multiple new technologies working together and not just a "single" breakthrough. While SCM is a very interesting technology with multiple potential applications, it's the combination of SCM working together with Non-Volatile Memory Express Over Fabrics (NVMeOF) as a transport mechanism that makes this technological advance particularly interesting in the storage realm.

If it wasn't for NVMeOF, this article would likely be focused on how 3D NAND memory is changing the server industry, which would be equally compelling due to the enablement of larger capacities of high performance, bit addressable "storage" to sit locally (in the same chassis) to the processors. We do believe this impact for servers to have larger L3 caches is important, if not equally as important, as SCM's impact to storage arrays.

What is SCM?

Storage class memory is a name that's been given for an overall class of non-volatile NAND storage that you can think of as a bridge between the highest speed memory devices (DDR SDRAM Memory) and the traditional NAND SSD storage that makes up today's EFDs. It has some of the properties of typical EFD's and some of the properties of traditional DIMMs. The "bit addressable granularity" is something that is easily over-looked, but very important when comparing SCM to flash and EFDs.

Why is SCM important?

While DDR SDRAM memory is very fast, it's also very expensive, has a relatively small capacity form factor, granularity at the bit level and is volatile, meaning that if power goes away so does the data. On the flip side, an EFD has a much larger capacity form factor, granularity at the block level, is non-volatile and is about 48X cheaper than DDR SDRAM. But when comparing an EFD to DDR SDRAM, EFDs are much slower (100X slower).

SCM fits in the middle between DDR SDRAM and EFD's specifications providing 10X the performance over EFDs, but roughly half the price of DDR SDRAM memory without the "wear-leveling" requirements of EFDs. We expect competition and optimization/larger mass production in the manufacturing space to continue to drive the SCM prices down, much like what happened with EFD's after being first introduced. Being bit addressable makes SCM more closely aligned to memory than it does flash and provides important benefits around wear-leveling, garbage collection and providing consistently low latency at all levels of IO and queue depths.

While on the server side, we believe this evolution of SCM memory technology will be significant, and we see a greater impact on the data storage side and on large capacity arrays by leveraging NVMeOF and SCM together. History has shown us that it took approximately ten years for EFD technology to become ubiquitous. For instance, EFDs were first introduced into the then EMC product line in the 2010/11 timeframe and while they haven't completely displaced 15K and 10K RPM (Revolutions Per Minute) spinning disks (yet), the writing is on the wall.

For years we have been speculating that drives will eventually be of two classes: very fast with relatively small capacities and relatively slow drives with relatively large capacities. That day is practically here! While it hasn't been realistic to have a large scale array (e.g.3PB) made up entirely of DDR SDRAM memory (or even TBs of DDR SDRAM cache for that matter), due to the prohibitive cost per GB and the challenges of what happens when power goes down, all flash arrays (AFF) have been widely adopted. Given the appetite for speed around analytics, data bases and things like AI and fraud detection, the ability to do things faster yet at a relatively moderate cost point is always of great interest and increases adoption.

Where does SCM lay in comparison to SSD and DDR SDRAM?

Bit addressable like DDR SDRAM.
Write durability of DDR SDRAM.
Non-volatility of EFD.
Speeds half-way between EFD and DDR SDRAM.
Capacities between DDR SDRAM and SSD (currently).
Half the cost of DDR SDRAM per GB but over 20X more expensive than EFDs.

Figure 1 below show approximately where SCM sits. SCM is about half the cost per GB of (volatile) DDR SDRAM and 10 times faster than (non-volatile) EFD.

Figure 1: Approximate speed and cost comparison between media types

Pulling it all together and a real world example

While there are different manufacturers of SCM, we have been working very closely with Intel and Intel® Optane™ technology both on the server side and the storage side of WWT's ATC testing. We think highly of Intel Optane technology and are excited to watch new capabilities that the technology introduces. See Figure 2 below for where Optane fits in today's data center.

Figure 2: Intel Optane and today's data center solutions

Since I'm part of WWT's Storage Engineering group, I'll stick to my swim lane and speak to the benefits for the storage technology side and some recent testing performed with VAST Data, whom we recently ran through our object storage lab and tested specifically using the S3 API access method. VAST uses an architecture that leverages Intel Optane technology as a caching layer for a QLC SSD back-end and ties all the nodes together using NVMeOF. This configuration allows for cheaper QLC SSDs, which do not have great write wear characteristics, to be buffered by SCM memory to act both as a caching layer and a way to ensure the best write leveling. The Optane caching layer allows for a very good performance profile and was able to keep up on an increasingly stepped workload of close to 200 thousand objects per second for up to ~60 minutes before saturating the caching layer and experiencing a performance drop off.

Figure 3: Small file PUTs "burst" workload. The orange plot is response time, gray/green plot is transactions per second.

More details on VAST and WWT's testing

The VAST 5X5 array WWT evaluated relies heavily on the SMC Optane caching, utilizes a split architecture of "C" Boxes making up the front end and "D" boxes making up the back-end capacity. The moniker 5X5 comes from a configuration of 5ea. "C" boxes and 5ea. "D" boxes. The mix of C:D nodes are variable, but this configuration is considered a "heavy hitter" and configured for both performance and capacity. Each 2U C box contained 4ea. C Nodes made up of Intel Model Servers (20 total C-Nodes). Each C-node node contains dual 16 core 2nd Gen Intel® Xeon® Scalable processors. The "D" box contains 12ea 1.5TB Optane 3D XPoint SSDs, 44ea 15TB QLC NAND SSD drives and 2ea D-nodes (10 total D-Nodes). An NVMeOF network ties it all together with upstream TCP/IP network connectivity consisting of 16ea. 100Gb Network Ports (all in 22RU) which connected to WWT's test jig made up of 28 bare metal servers running a specialized version of the Object Storage bench marking load generation tool "COSbench."

Figure 4: VAST Data configuration tested that leverages Intel Optane technology

For more details and a deeper dive on the benefits and where we see technologies like Intel Optane, NVMeOF and the impact for Enterprise Storage environments by introducing products like VAST Data, please contact us.