8 Trends Shaping the Data Storage Landscape
In this article
With the massive increase in data and applications and the possible need for infrastructure modernization, IT leaders are constantly challenged to find solutions that will support the bandwidth demand and growth within their organization's IT infrastructure.
At WWT, we have a unique perspective in this space as it's informed by our engagements with our customers, partners and other industry experts. Looking back on what we deemed as the most impactful trends in data storage in 2022, we will be discussing what's changed, what's remained constant and what should we keep our eyes on.
Last year, we stated that "storage groups must work again to customize storage to the application, ensuring optimal performance, uptime, resiliency and, ultimately, a great end-user experience," and we see this trend continuing in 2023. It has been table-stakes to talk about requirements, use cases and applications to identify the proper storage solution. However, we see some acceleration in high-performance computing (HPC) and AI/ML, which may require a different type of storage system than enterprises typically consider. See Things to Keep an Eye On for additional HPC and AI/ML comments. We also see the growth of workloads requiring extensive unstructured data in scalable network-attached storage (NAS) and the object storage space for big data, data warehouse and streaming machine data.
In addition, we expect the trend of moving to an all-flash data center, especially integrating QLC media in the primary storage systems, to continue. We have seen QLC used for secondary storage systems and workloads like backup and archive. In the future, we expect that QLC will be part of the primary storage systems as enhancements continue around reliability, longevity and performance. The blend of QLC with TLC/MLC may be appropriate for workloads that do not require the highest performance. For an overview of NAND flash technologies, check out this article.
Lastly, we expect the supply constraints we saw in 2022 will abate in 2023 and should not affect the decision on the storage system selection.
Integrating data centers and public clouds is rapidly becoming a reality for businesses across various industries as organizations understand the importance of leveraging cloud services such as AI and cloud analytics to stay ahead of the competition. And, they recognize the untapped potential of their onsite data and the need to extend their data centers to the public cloud.
One of the drivers of this trend is the ability to burst workloads into the cloud. This refers to the ability to transfer computing resources from the data center to the cloud during periods of high demand. The ability to burst workloads into the cloud helps organizations manage their computing resources more efficiently and ensure unexpected spikes do not bog them down during peak times.
Another factor contributing to incorporating data centers and the public cloud is the integration with applications such as Snowflake. These applications make it easier for organizations to transfer their data to the cloud and exploit its many benefits.
Moving VMware to cloud platforms such as AWS (VMC), Google Cloud (GCVE) and Azure (AVS) is also becoming increasingly popular among organizations. This helps simplify the management of virtualized environments and reduces the need for onsite infrastructure.
To help with this integration, many storage OEMs are extending their storage-as-a-service (STaaS) offerings into the public cloud, allowing customers to operationalize cloud storage more easily. This has become attractive for organizations seeking more efficient and cost-effective storage solutions. In line with this strategy is the ability to federate with the public cloud to utilize a more cost-effective storage tier, such as AWS S3 or Glacier. At the same time, large enterprises create these tiers in their data centers; leveraging the public cloud to provide a backup or tertiary copy of the data is a desirable capability affecting OEM procurement choices even if customers are not actively leveraging this ability.
Additionally, independent software vendors (ISVs) invest heavily in providing public cloud data services. This is driving the integration of data centers and the public cloud, making it easier for organizations to deploy enterprise storage in the cloud that supports data management features with an on-premises storage environment.
With the numerous benefits of the cloud, organizations must extend their data centers to the cloud to take advantage of the many available opportunities. Whether through bursting workloads, application integrations, moving VMware to the cloud or taking advantage of storage solutions, businesses that move to the public cloud are sure to stay ahead of the curve.
In last year's article, we focused on replacing the SCSI protocol with NVMe. Indeed, as customers adopt new technologies, SCSI will naturally fade away, but NVMe is not the specific driver of our customers' change. The shift in customer thinking is less around adopting NVMe and more around moving to ethernet-based storage fabrics. The question is, why?
It's not because Fibre Channel (FC) doesn't work. In fact, FC works exceptionally well. It is purpose-built and lossless, providing a dedicated fabric for storage traffic optimized for east-west data movement. NVMe-over-FC is available for using new protocols with your existing Fibre Channel infrastructure. However, customers want increased standardization and flexibility within their data centers, including public clouds, where all traffic is IP-based. With independent FC and ethernet fabrics, you have limitations on infrastructure placement since you may not have a ubiquitous FC cable plant deployed, coupled with the cost of FC switches or directors. Additionally, the cost of high-speed ethernet ports continues to decline while FC port costs are somewhat consistent. 100Gb/s ports are now within the realm of possibility for many, but some FC users are just now getting to 32Gb/s; it has been challenging to source ports for 64Gb/s FC.
On the ethernet-based protocol front, three contenders are trying to pick up where iSCSI failed to gain ground regarding block storage connectivity: iWARP, RoCE and NVMe over TCP, each having its advantages and disadvantages. Customers are watching and waiting to see who will be crowned the winner. While iWARP and RoCE perform better, NVMe over TCP has tangible performance advantages over iSCSI, is simpler to set up, and does not require special hardware (switches and NICs) or network configurations, which should equate to lower operational costs.
With the increasing adoption of cloud services, many customers continue to find the OPEX purchasing model appealing. STaaS provides the advantage of on-premises hardware without the hassle of managing and operating it, which is typically required in a traditional CAPEX hardware purchase. Additionally, STaaS can serve as a temporary solution to bridge the gap between on-premises and cloud environments, offering short-term and predictable costs.
Some customers are concerned about the rising costs of their cloud spend, prompting them to consider repatriating some of their data. However, they may need more staff or skills. For these customers, a consumption model can be very beneficial. It allows them to bring some IT infrastructure back to the data center without the burden of managing it.
Critical components of an OPEX storage deployment typically include OEM-managed upgrades and updates, base-level storage capacity commitments, and technology refreshes.
The STaaS offerings vary significantly from OEM to OEM. They may include, but are not limited to:
- Dell APEX
- NetApp Keystone
- Pure's Evergreen//One
- HPE Greenlake
- Hitachi Everflex
- IBM Storage as a Service
There are additional costs associated with a fully managed approach; it's essential to understand the nuances of each OEM offering. We recommend a thorough TCO analysis to assess the viability of moving to a STaaS model.
Automating storage management continues to be a top priority for all our customers, aimed at reducing costs, improving operational efficiency and minimizing errors. While automation initiatives don't usually start with storage, the benefits of automation in the storage realm are just as significant. The benefits include reducing the toll required for everyday storage admin tasks and removing human error and snowflake configurations, resulting in a faster production time. However, storage automation requires a mindset change for the storage admins. They need to be comfortable with shifting control to allow the automation scripts and tools to perform basic day-to-day tasks, allowing them to focus on other, more critical storage management tasks.
You must look beyond the storage arrays to get the most ROI from your automation initiatives. The storage fabric (Fibre Channel and Ethernet) is just as important, and a place where human errors can occur, and often minimal cleanup is done as hardware is refreshed or retired. If the network plumbing isn't configured correctly, the reliability of your application and workloads may suffer from an uptime and performance perspective.
Many storage OEMs are taking an API-first approach to management and operations. This provides flexibility for creating your own scripts or using provided scripts for tools like Ansible and Terraform. Some OEMs are offering storage-as-code to automate everyday tasks.
Automation will be sticking around, and the adoption levels will only increase in the years to come.
We see legacy and newer cloud-friendly distributed applications starting to move toward an edge data model. This could be by moving the compute closer to the edge as mentioned above (e.g., AWS Outpost), but it still can require local storage to reduce latency. This is done by accessing and processing the data at the edge or moving needed data or a copy closer to the edge.
For customers that need to support legacy NAS at the edge, we see a significant uptick around NAS appliances. These appliances go by many names, such as Edge Caching Appliances, NAS gateways, etc. Still, in most cases, their goal is the same: lower storage response times and improve the user experience or application latency at the edge using a cost-effective, lightweight appliance (often in the form of a virtual machine). An added benefit is that some of these solutions can present global access to petabytes of de-duplicated and compressed data from the "back-end" (remote capacity) to an S3 API-compatible object cloud repository while providing required NAS shares containing the working data set out the "front-end" (local capacity). This approach can add functionality, such as providing the global file system access while reducing costs using a commodity storage appliance instead of a traditional NAS filer. Further cost savings are realized by employing storage optimization techniques, like de-duplication, across exceptionally large, petabyte-scale data sets, reducing the capacity under management, and storing and organizing the centralized back-end data in a cost-effective, highly resilient object cloud, eliminating the need for additional data copies required for traditional backup and disaster recovery strategies.
Historically, enterprise data protection strategies have focused on dealing with accidental data loss or corruption incidents and complete site failure scenarios in the form of traditional backups (tape and snapshots), local fail-over instances (business continuance strategies) and/or data replication to a secondary site (disaster recovery strategies). While those are still important and potentially a regulatory requirement, a new contender has entered the ring. Cyber-threat vectors are multifaceted and feature things like bad actors lurking for months or employees unknowingly injecting self-replicating malware into the company's systems. To complicate these issues further, these attacks are no longer uncomplicated, brute force attacks by a disgruntled employee or a novice hacker. They are very organized and sometimes state-sponsored. The bad actors typically have been in your systems undetected for months, mapping out your existing data protection strategies, mining sensitive data and planning for a coordinated and devastating attack to bring your organization to its knees.
Like all things, comprehensive planning is the key to success, though there is no silver bullet. Enterprises are pursuing broader strategies spanning the business, application, enterprise architecture, security and IT teams to mitigate risk and improve overall data security. Large organizations are pursuing strategic cyber resilience programs but are starting with a cyber recovery initiative focused on a turnkey data vault. Smaller organizations are focusing on a zero-trust approach, immutable backup strategies and employing technologies such as object storage and versioning
With the above in mind, there are still things that IT and storage teams just starting their cyber protection journey can do to protect data. Understanding data and tiering according to application priority is an excellent place to start. Taking advantage of encryption, immutability and indelibility features for critical production data contributes to exposure reduction. Finally, adding backups, data cleansing and local/remote storage replication can support a robust data protection posture.
There remains high interest in Kubernetes in 2023. Kubernetes is widely used in cloud-native application development and has become the de facto standard for container orchestration. It supports a wide range of container runtimes and can run on any infrastructure, whether on-premises, public cloud or hybrid cloud.
More than half of customers using containers today utilize persistent storage in the cloud; it is common for others that have recently adopted containerized applications to use whatever traditional storage they have available on-premises. Both approaches have significant limitations, leading storage manufacturers to evolve to meet the needs of modern container orchestration platforms.
The next generation of persistent storage for containers is container-native storage (CNS). CNS is software-defined storage (SDS) that can be presented as native code within Kubernetes, allowing for tighter integration and advanced functions such as data protection policies and data mobility. As containerized applications become more critical to business operations, fast backups/restores and data mobility across environments are driving standards. As smaller containerized workloads scale to the point where cloud-based storage is no longer cost-effective, dedicated on-premises storage for containers will need consideration.
We see a solid push to AI for natural language processing (NLP), image recognition, decision-making and problem-solving.
Machine learning (ML) is a subfield of AI that focuses on developing algorithms and models to learn and improve from experience without being explicitly programmed. Machine learning uses these algorithms and statistical models to analyze large datasets, identify patterns, and make predictions or decisions based on that data.
HPC is used in various fields, including weather forecasting, climate modeling, oil and gas exploration, computational fluid dynamics, financial modeling, and drug discovery. It has also played a critical role in advancing scientific research in fields such as astrophysics, genomics and particle physics. We expect to see an aggressive interest in leveraging HPC in the future.
HPC systems typically use parallel processing techniques to divide large computational tasks into smaller ones that can be executed simultaneously across multiple processors or nodes. These systems are designed to handle vast amounts of data and run applications requiring high accuracy, precision and speed. As a result, data storage may require a non-traditional approach to support these workloads.
Here are several HPC-related articles that may interest you:
Use cases and application software continue to expand into leveraging object storage and the S3 API. We see OEMs focusing on performance in their object stacks for single instance / single site performance comparable to current NAS performance profiles. This trend is driven by the desire to use the S3 API stack for development and the flexibility to use a lower-cost or potentially commodity-based, large-scale RAIN (Redundant Array of Independent Nodes). The OEMs of traditional NAS arrays are also adding an S3 API-compatible access method. However, these add-ons typically do not have the full functionality of the native object storage arrays.
With growing expectations from stakeholders and board members, companies are prioritizing ESG factors in their decision-making process and reporting on their performance in these areas. As a result, there is an increasing emphasis on sustainability and corporate social responsibility as a crucial part of business strategy. ESG factors naturally encompass data storage and its environmental impact, including the carbon footprint, natural resource usage and waste management practices. The end goal for a green/carbon neutral/sustainability strategy is data center efficiency. We suggest making power and cooling the priority. This will lay a solid foundation for future ESG product releases from the storage OEMs as they continuously design to meet these expectations.
To help you keep up with the changing storage landscape, there are many ways we can assist in giving you insights into critical key performance indicators related to your infrastructure and storage stack, including protecting your most valuable IT asset – your data.
We also recommend exploring our Advanced Technology Center (ATC) to gain hands-on experience with the latest technologies and cut your proof-of-concept time from months to weeks. Our deep-rooted relationships with major OEMs and rigorous evaluation of recent technology providers can help streamline decision-making, testing and troubleshooting.
For more information on primary storage, data protection, cyber resilience or any of the topics mentioned within the article, connect with one of our storage industry experts today.