Top Trends Shaping Data Protection Strategies
In this article
This article was originally published on September 22, 2022, and has been updated to reflect current trends.
Between prioritizing how to secure data and keeping up with the increased data demand, organizations need solutions that will help them keep up in an age of shifting threat landscapes and technology disruption.
In this article, we will discuss what customers are most concerned about and where we see our technology partners focusing their efforts regarding security, backup storage targets, as-a-service offerings, multicloud, recovery, regulatory pressures, and next-generation solutions in the data protection market.
Although ransomware has been around for some time, it continues to remain a threat to organizations. According to IDC's 2021 Ransomware Study, approximately 37 percent of global organizations were the victim of some form of ransomware attack in 2021. And according to Verizon Data Breach Investigations Report, that trend has continued with an almost 13 percent increase in 2022 (an increase as large as the last five years combined). Unfortunately, ransomware attacks won't be ending any time soon and will most likely evolve, making security top of mind for business leaders.
Due to the increased ransomware attacks worldwide, data security and data protection, two IT functions that traditionally are separate, are becoming necessarily linked.
When it comes to ransomware, zero trust architecture and immutable storage are effective defenses. To take backup and recovery one step further, data isolation in some form of vault solution is the best protection. Important considerations around data vaulting include the level of isolation and additional services and resources for recovery operations, such as a clean room and various analytics packages.
In addition, the ability to provide analytics and observations on the backup stream is proving to be a key innovation in the data protection space. By its very nature, backup touches everything important within a customer's environment. Analytics performed on the backup data (or by the backup client) are an effective way to determine if an encryption event is occurring and the blast radius of a cyber-attack. Today, nearly every data protection OEM has a basic level of anomaly detection designed to alert administrators when large numbers of changes to data occur. Supplementing this with advanced artificial intelligence and machine learning techniques allows for deeper inspection and additional insight around aspects like dwell time (how long have the attackers been in the network) and silent infection (where is the code waiting to reinfect). This type of information is invaluable in reducing recovery time from a cyber-attack.
Backup storage targets
The Purpose Built Backup Appliance (PBBA) market is seeing continued growth in both scale-up and scale-out appliances. PBBAs provide improved scalability of traditional disk backups by including deduplication and compression to reduce the TCO for storing backups. Adding capabilities for tiering, replication and immutability of backups provides an additional potential for protection against ransomware.
In addition to PBBAs, traditional primary storage vendors have introduced products that include immutability, deduplication, compression, replication and object storage targets. These vendors are blurring the lines between traditional PBBAs and primary storage used for backups. Just as with PBBAs, the deployments vary widely from small to very large scale-out and multi-tier architectures.
Every enterprise backup vendor has support for object storage today and its uses and roles will continue to expand. In theory, object storage is infinitely expandable and can boast 11 9s durability, making it an easy choice for backup administrators. When you combine the cost advantage of object storage vs. traditional disk and combine it with immutability (object-lock, immutable blobs, etc.), object storage is a nearly ideal target for backup.
After a decades-long decline, the use of tape for data retention will increase. Tape is inherently immutable and is considered environmentally friendly, being as it requires no power while idle. Combined with the ability to reliably maintain data for up to 30 years, it is also a perfect archive medium. Most of the increase in tape utilization will be transparent to backup users since it will be another tier for blob and object storage when restore service level agreements (SLAs) are hours to days.
Today, data protection offerings can be consumed in a variety of ways: from a simple subscription model that shifts costs from CAPEX to OPEX to fully managed operations through as-a-service offerings, where customers are provided a service-level agreement (SLA) and per unit cost to protect. Generally available data protection offerings in the market include:
Software as a Service (SaaS): Customers purchase the use of software on a subscription basis and don't have to worry about how or where it's running. However, all administrative tasks remain the customer's responsibility.
Backup as a Service (BaaS): Customers are purchasing an outcome. Service-level objectives are defined, and the provider is responsible for meeting them.
Disaster Recovery as a Service (DRaaS): Customers are purchasing 'disaster recovery in the cloud'. This is typically a replication target and offers the ability to convert virtual machines to the native format of that cloud target (i.e., virtual machines converted from VMware native format to AWS or Azure native format) and spin up on-demand.
Cyber as a Service: As with DRaaS, this solution offers a replication target with additional logical isolation, immutability, enhanced security hardening and malware/ransomware detection capabilities.
Infrastructure as a Service (IaaS): IaaS is still a viable and useful option for customers who are accustomed to running their own environments but want to consume capacity in an 'on the drip' OPEX model. The main advantage of IaaS versus capital purchasing is that the customer only pays for the capacity they are consuming.
The primary driver in multicloud adoption for organizations is the need for workload mobility. This is evident across all use cases and even includes hybrid clouds that maintain a strong dependence on workloads remaining adjacent to big iron investments.
It is important to keep in mind that multicloud remains the most fluid and dynamic environment within the market today as we review the four primary use cases:
Data destination: This allows customers to reduce consumption of storage on-premises while providing a solution that aligns with compliance or business requirements to keep a copy of your data offsite.
Most data protection vendors can accommodate this efficiently and are usually differentiated by how deeply integrated they may be with services offered within a given cloud provider — with some more tightly aligned with the AWS, Azure and/or Google Cloud Platform. The on-prem object providers and regional hyperscalers find utilization in more or use-case restrictive outcomes for our customers.
Automation: The ability to implement and manage data protection through native API functionality has become a requirement for most enterprise customers. The capability to deploy Infrastructure as Code and create a self-service catalog for data protection aligns with our enterprise customers' long-term automation strategies. This is realized by automating "Day 1" with embedding protection as an element of production readiness and "Day 2" operations with workload mobility (into the cloud) or embedding automation with the workflow of the on-premises hypervisor.
In-cloud adoption: Automation facilitates mobility to accelerate in-cloud adoption. This is often implemented as rehosting applications into a hyperscaler environment. And with added effort, customers can replatform applications to leverage Platform as a Service (PaaS) offerings.
Tackling the increase in volume and velocity of platform sprawl is where data protection is being stretched thin. The operations of protecting and providing governance of these critical datasets is driving a purposeful evaluation of vendors and their capabilities to absorb and simplify this reality.
Disaster recovery: We see customers evaluating how disaster recovery capabilities can be improved through extended features and functionality embedded in their data protection solution. Solutions such as cloud disaster recovery and DRaaS can provide the ability to shift workloads to or between clouds as part of the recovery process, while embedded or integrated continuous data protection (CDP) can enable near zero RTO/RPO options for critical applications.
These technologies should be considered as part of an overall conversation around what RPT/RTO levels can be achievable and how they can leverage complementary orchestration and automation capabilities to reduce recovery time and workload.
The purpose of backups is to ensure recoverability in the event of data loss or corruption in production. The ability to recover data is a 'table stakes' item. Customers are increasingly interested in making recovery faster and easier.
As we begin to discuss recovery, we often see the predominant use case being operational rather than disaster related. These are the recoveries driven by user or software error that occurs within production. Here are some key requirements to consider:
Instant application availability: With an organization's backup data migrating from an offline serial access model (tape) to an online native format (backups on disk), organizations expect their protection storage to be able to mount those backup copies to a production host and leverage that data immediately. Key use cases here are 1) the ability to spin up a virtual machine instantly from protection storage and vMotion it back into production while it is running, and 2) the ability to mount databases to a production host to recover individual elements from the database (row level recovery within a table) or reuse/refresh production data into test/dev.
Orchestrate recovery: With environments growing and budgets shrinking, organizations are increasingly looking to automation to drive efficacy. The ability to automate recovery to a 'one click' standard is a growing requirement. API integration and automation capabilities enable the recovery of thousands of virtual machines quickly and efficiently, saving the business millions of dollars in the event of a large-scale failure.
Self-service: The ability for application admins to perform their own recoveries without having to open a ticket to the backup team and wait for a response takes pressure off backup admins and enables the business to correct issues in production much faster.
Data validation: The time to learn that there are problems with a backup is not at the time of recovery. Backup software must be able to test and ensure recoverability outside of the normal recovery workflow. This can be as basic as running checksum validation against backup data to prevent silent corruption to fully realized recovery testing with automation in a sandbox environment.
Cloud recovery: Customers are increasingly unwilling to devote dedicated resources to stand idle waiting for a recovery. This is driving increased capabilities to recover virtual machines and even databases to a cloud target and provision storage and compute as needed. Key use cases here are disaster recovery, operational recovery and enabling 'lift and shift' migrations to the cloud.
Continuous data protection integration: Customers are looking to continuous data protection to ensure tight RPOs and RTOs for critical business applications. Once solely the domain of array-based replication on the back end, this technology has migrated to the front end with critical virtual machines being replicated between sites for hot standby capabilities for near zero downtime/data loss failovers. Backup software vendors are increasingly integrating these capabilities into their products to provide SLA-driven 'one pane of glass' management of both backup and continuous data protection operations.
The European General Data Protection Regulation (GDPR) as well as state privacy regulations such as the California Consumer Privacy Act (CCPA) is changing how user data is backed up, stored and retained. At least 27 states are proposing CCPA-like regulations that will dictate how long personal information is allowed to be stored, including on backups. While there aren't any clear policies in place, businesses subject to the GDPR and CCPA's requirements should have a plan in place to handle deletions of private information in backup data.
As more enterprise data protection solutions reside in the cloud (see SaaS and BaaS above), data sovereignty must be a consideration of the solution. Data sovereignty is a requirement that data is subject to the laws of the country in which it is collected or processed and must remain within that country's borders.
As more organizations start to move their workloads and applications to the cloud and container-based architectures, they expect backup software providers to ensure protection of the workloads they have today and look to their future needs as well. These expectations include:
SaaS workload coverage: Customers are becoming aware of the gaps in the shared responsibility model used by cloud and SaaS providers where vendors typically will protect the infrastructure and uptime of the environments but provide limited coverage for the customer's actual data. Now, integrated protection of Microsoft 365 and Google Wordspace data with the ability to replicate the data between availability zones, cloud providers or even back to customers premises is a growing requirement.
Backup for containers: Data within containerized environments, such as Kubernetes or Tanzu, is often regarded as 'ephemeral' and not requiring protection. However, we are seeing that customers' actual implementation of these environments often results in containers that must be protected. Additionally, the container infrastructure and associated data needed to rebuild that environment quickly needs to be protected as these implementations move from the lab to production.
Large network attached storage (NAS) protection: Protecting large unstructured data environments is not a new challenge, but the methods must evolve to keep pace with the explosive growth of these environments. Moving petabytes of unstructured data to a separate environment to deploy a true backup of that environment is difficult at best and cannot be accomplished with traditional Network Data Management Protocol (NDMP) for the largest datasets. Backup software vendors are looking to NAS platform integration to save time scanning filesystems and multiple proxy mounts to drive massive parallelism to accomplish the data movement.
Primary storage integration: Storage platform integration isn't a new concept in the backup environment, but with massive datasets and growing requirements for automation and seamless integration to accomplish backups within an SLA, the industry is looking to backup application vendors to integrate with a variety of storage platforms to enable success.
Internet of Things (IoT): IoT devices are becoming nearly ubiquitous in our daily lives. In industries such as healthcare, oil & gas and manufacturing where it's vital to safeguard IoT data, collecting and preserving that data will require significant planning and resources to ensure the data is protected. In addition to the data generated, IoT device configurations must be protected in the event they are compromised.
Network management: Network configuration data isn't a new workload, but the drive to protect it as part of the Minimum Viable Solution for a ransomware recovery is driving backup teams to now include it at least daily.