Thoughts on Kubernetes Data Protection

According to the CNCF 2021 Annual Survey, "96% of organizations are either using or evaluating Kubernetes".

Kubernetes (K8s) and containerization is the logical next step in the virtualization of applications and workloads. Just as virtualization allowed multiple "virtual" computers, each with their own Operating System (OS) and memory and on a single physical server, containers provide lightweight, portable application deployments without regard to the OS. It is possible to deploy 100s of lightweight containers on the same physical server that hosted just a handful of virtual machines.

One of the earliest philosophies for containers was to deploy them entirely as stateless applications, meaning no persistent data should reside within the container. This stateless nature meant there wasn't a need for any type of data protection (aka "backup"). If a container was corrupted or compromised, it was simply destroyed and redeployed. In the past few years, the value of running stateful workloads in containers that hold persistent data has gained adoption. Along with that, the traditional corporate data protection capabilities need to be supported, including retention, recovery and reporting of this valued dataset. (To replace the previous sentence)

This paper is intended to give a brief overview of thoughts for protecting Kubernetes and outline a few data protection vendors that World Wide Technology works with closely. For more information, please go to the Data Protection page.

Stateful applications

When the use of containers was being developed, it was intended as a mechanism for running stateless applications. As container applications grew and became more prevalent, it was bound to happen that stateful workloads would be introduced and rapidly expand. Kubernetes provides Persistent Volumes (PV), which remain intact when containers are built and discarded to store stateful data.

Some of the most prevalent deployments of stateful applications for Kubernetes clusters today include Artificial Intelligence, Machine Learning, Data Analytics and Messaging systems such as Kafka.

Protecting those stateful applications within a Kubernetes cluster is now ranked equally with most traditional workloads in a datacenter. However, unlike more traditional workloads, containers are subject to rapid deployments and changes, requiring that data protection solutions are comparable to the more mature solutions for virtualization, databases and Network Attached Storage (NAS) but still container specific capabilities.

Intellectual property & rebuild time

As alluded to in the Stateful Applications section, Kubernetes provides rapid application deployments. As Kubernetes clusters grow, they contain an increasing amount of Intellectual Property (IP) that needs to be protected. It's likely that the source code IP is protected in a repository but doesn't consider the interconnections that an application may have taken months or years to build.

This is a two-part problem that data protection solutions overcome.

The first problem data protection solutions overcome is rebuild time of a cluster in the event of a node or cluster loss. A good data protection solution backs up applications, data and essentially the machine state at a point in time. This allows for a rapid recovery by simply restoring to the last known good state (from backup). Recovering a node is typically faster than rebuilding the node from its components.

The second problem solved is application rollback. With rapid deployments, it is common to need to roll back to a previous version of an application if a bug or corruption is discovered. Most data protection applications provide this capability and will assist in a rapid restore to bring a cluster in a previous good state.

Cyber recover/ransomware

Ransomware attacks have had a huge impact on traditional applications for years are now targeting Kubernetes environments, are also putting them at risk. As the number and scope of Kubernetes applications increase, so do malicious attacks on those applications.

A Veritas news release in March 2022 indicated that 48% of organizations that have deployed Kubernetes have already experienced a ransomware attack on their containerized environments, while a staggering 89% of respondents said that ransomware attacks on Kubernetes environments are "an issue for their organizations today."

With the increased threat, it's vitally important to harden your data protection methods and ensure your backups and data protection strategy are immutable to ransomware, therefore guaranteeing the ability to restore the cluster in the event of compromise.

Migrations

One of the promises of Kubernetes is the portability of containers between clouds, both on-premises and public cloud. That portability has been largely realized due to the introduction of the Container Storage Interface (CSI) that decouples the storage implementation and allows block, file, and object storage to be consumed by a pod in a portable way.

Even with CSI, migrations from one cloud to another can be time consuming and very tedious. While it is easy to migrate an application from one cloud to another, the data imbedded in Persistent Volumes needs to be migrated carefully to ensure it is accessible inside the new cloud.

Reviewing your data protection vendors migration strategy is crucial if you expect to deploy a multi-cloud cluster. Some vendors have extensive migration capabilities while others don't support it whatsoever.

Data protection marketplace for Kubernetes

The data protection marketplace for Kubernetes is still rather immature. There are few solutions which cover all use cases. Joep Piscaer in the GigaOm Radar for Kubernetes Data Protection v3.0 states it quite well: "The best solution for your organization isn't necessarily the one that ticks the most boxes in our research for this Radar, or even all of them; it's the one that ticks the right boxes at the right price point."

Backing up Kubernetes

As stated above, the best solution is one that ticks the right boxes. The requirements for protecting a Kubernetes cluster should be dictated by the conditions needed for restoration. For example, are you only planning to restore to the same cluster and hardware configuration? Or will you be using the backup tool to migrate between dissimilar distributions such as Microsoft Azure Kubernetes Engine (AKE) to Elastic Kubernetes Service (EKS)?

What to protect

There are essential parts of any Kubernetes cluster which must be protected. Those include the:

Kubernetes etcd database. The etcd database is used by Kubernetes to store all of its configuration data, including cluster state, configurations, and secrets. Backing up etcd data protects the entire state of the cluster and is critical for cluster recovery.
Persistent Volumes (PVs) and Persistent Volume Claims (PVCs): PVs and PVCs are used for persistent storage in Kubernetes. Demand for persistent data is growing and backing up PV and PVC data is essential to protect the stateful data in Kubernetes.
Namespace(s). Namespaces are analogous to applications in Kubernetes. Backing up namespace configurations, including resource quotas, role-based access control (RBAC) settings, and other namespace-specific configurations, is required for recovery of namespace-specific settings.
Cluster-Scoped Resources. Also called Custom Resource Definitions (CRDs), these are custom resources in a Kubernetes cluster. If your cluster uses any custom resources, it is important to backup their definitions to ensure that these resources can be correctly restored.
Ingress Configurations. Ingress configurations are used to access Kubernetes services from outside the cluster. Backing up ingress configurations is vital for ensuring networking is resumed when a restore is complete.
Cluster-level access control configurations: Role-based access control (RBAC) is used to define permissions in a Kubernetes cluster. Backing up RBAC configurations is important to restore proper access controls in the cluster.

Data protection solutions

While there are dozens of data protections solutions available to backup and restore Kubernetes today, this article will focus on offerings from Cohesity, Commvault, Dell, Rubrik, Veeam and Veritas. The intent is not to pick a winner but to provide some consideration for evaluating the six vendors and their solutions.

When you require a complete backup solution that can provide granularity as well as the ability to migrate between distributions, there are three strong choices. Kasten from Veeam is likely the most complete solution for Kubernetes, but being Kubernetes-only, it does this at the expense of not being able to protect any other parts of your environment.

Other strong solutions to consider are Veritas and Commvault. Both products have strong, feature-rich solutions that work on-premises and in-cloud. Both also provide Backup-as-a-Service options. In addition to Kubernetes, they have rich feature portfolios that will protect nearly any segment of your infrastructure. The downside to Veritas and Commvault is that the strong feature set and broad coverage comes at the cost of added complexity to deploy.

The last three products integrate the open-source tool Velero (Velero.io). Dell, Cohesity and Rubrik provide the benefit of simplicity for rapid deployment and ease of use to provide a more basic Kubernetes deployment. The ease of use and simplicity comes at the expense of features. For example, cluster migration is supported, but not between cloud distributions.

All the solutions mentioned will capably protect your Kubernetes deployments. Since Kubernetes and container deployments are continuing to evolve, it is more important than ever to evaluate end state requirements to narrow your choices.