DevOps: A Survival Guide for Infrastructure Teams
In this article
This article was originally published in 2017. While the core components have not changed significantly, we have refreshed the content to reflect updated approaches.
DevOps means something different to different people. It is one of those words that gets pulled in many directions by everyone trying to sell something — sometimes to such an extreme that it becomes meaningless because the word ends up encompassing everything.
Let me explain my take beyond tools and buzzwords to avoid the buzzword trap and start addressing how infrastructure teams can be part of DevOps efforts in their organizations.
We can think of DevOps as the evolution of the revolution that started when our processes became the main obstacle affecting our ability to deliver results and respond quickly to the changes in the market.
In this Tec17 podcast, hear WWT's Jason Guibert and Paul Richards discuss what DevOps is, its many benefits, how to avoid some common pitfalls and how WWT works with customers to implement DevOps processes into their infrastructure.
In its purest form, DevOps is an outcome-driven culture focused on the agility to provide fast results while retaining quality.
But where is technology in this definition, you ask? Where are the developers? The operations team? The infrastructure team?
WWT doesn't include specific tools or individual groups in our definition because DevOps is a culture — that is, it's made up of the characteristics and knowledge acquired by a group of people over generations through individual and group endeavoring.
So what's with all the other definitions of DevOps floating around? Maybe you've heard that DevOps means developers and operations groups working together or developers with operations accountability. Maybe you've heard that it's about infrastructure as code, using automation for repetitive tasks or abstraction of the infrastructure. You've probably even heard that DevOps means using Kanban or SCRUM methodologies or using microservices architectures.
Are those definitions of DevOps wrong? No, each one is correct for a different group.
Think about culture from a societal perspective, where environments, experiences and needs can differ by region. Consider New Yorkers and Texans. When a New Yorker commutes to work, they either walk, take a taxi or the subway, and say, "I'll be there in 30 minutes." Meanwhile, someone from Texas will likely drive their own truck and measure their commute in hours. Though part of the same American culture, both groups have adapted the normal commuting process to meet their realities.
The same happens with DevOps inside an organization. The specific definition of DevOps and corresponding careabouts may differ based on the realities and needs of each particular group.
When a traditional organization adopts DevOps agile methodologies, the development and infrastructure teams, who typically don't work together, must collaborate to achieve an outcome. While adopting DevOps methodologies will drive different efforts for each group, they're working toward a shared goal.
The key methodologies that drive the agile foundation of a DevOps culture include open and direct communication, collaboration among and within teams and automation.
As part of an organization's evolution toward a DevOps culture, a new group emerges as the translation bridge or liaison between developer and infrastructure teams. This group consists of individuals from each team who understand enough of the other side to map requirements into standard services platforms.
To illustrate this point, consider the questions in the scenario where the development group asks for Kubernetes or microservices support. What does this ask mean for the storage team? How can storage admins support it? What does that mean for the networking team? How can network admins support it?
Now consider the questions that arise in a reverse scenario where the IT team has to enforce compliance across the organization. How is data-at-rest encryption implemented in Kubernetes? How are backups managed in microservices architectures? How are the organization's investments in infrastructure relevant?
These questions don't have a straight answer and require a fair understanding of both viewpoints to identify a balanced solution.
Over time, as DevOps expertise builds in the organization, members of both groups will converge around a common language that allows for more accessible communication and faster response from the infrastructure team enabling new services and faster adoption of services from the development teams.
But this doesn't happen overnight. So where do we start?
As organizations adopt DevOps visions, the role of the infrastructure team is to become a service provider for the internal teams. While the infrastructure team must deliver the service, they must steer clear of the consumers of the service at the same time. In other words, the infrastructure team must be invisible to development teams.
There is no magic wand to make this happen. There is no single step or process — it's a journey.
As you can see in this diagram, the DevOps journey starts with adopting agile IT processes and methodologies. Once agile IT is combined with collaboration, you can progress into more DevOps-oriented phases.
Let's take a look at several scenarios to see how infrastructure teams can work toward achieving DevOps adoption.
Under this scenario, you should start with the adoption of agile IT. While the agile methodologies and principles that form the Agile Manifesto used by developers do not have a direct translation into IT, they provide infrastructure teams with a good idea of what to expect.
For example, if we rephrase the first two principles of the Agile Manifesto, we get the following:
- Our highest priority is to satisfy the customer through early and continuous delivery of valuable services.
- Welcome changing requirements — agile processes harness change for the customer's competitive advantage.
That is what is expected from the infrastructure team. How do we get there? It helps to think of the developers as your customers.
- Automation: Start working with the automation of repetitive tasks. When defining automation tasks, follow best practices for configuration and hardening of the specific task. For example, if you're in storage, automate the provisioning of storage and the definition of access rules. If you're in networking, automate the provisioning of VLAN, ports or BGP sessions following best practices.
- Infrastructure as Code: Start working toward achieving infrastructure as code (IaC). Orchestrate the automation tasks into workflows that deliver consumable resources (compute, storage, network) with consistent and predictable results. Note: this is not only about virtual environments; it includes physical and virtual resources.
- Software-defined: Adopt software-defined everything (SDx). A software-defined data center allows your organization to be agile and adapt to ever-changing requirements.
- Enable abstraction of the infrastructures. Enable APIs, especially integration with RESTful API interfaces. Think of APIs as the hooks or venues to provide on-demand consumable resources. Platforms and tools higher in the stack consume APIs. If you have traditional enterprise equipment, this is likely already supported. If not, contact your hardware or software provider, as most OEMs now provide APIs for integrations and extensibility. WARNING: Don't forget to secure access to your APIs!
- Self-service: Enable the ability to consume infrastructure resources over self-service portals or service catalogs. This goes back to "getting out of the way."
Scenario 2: Developers went rogue using modern techniques to simplify their work without telling infrastructure teams
This might be one of the more complex scenarios, where the infrastructure team is playing catchup with the widespread use of technologies that "don't need the infrastructure team."
This scenario can start when a development team lacks a viable solution in-house and finds itself restrained from going to a cloud provider. The developers think the infrastructure team is too slow (even though many times this is the result of the lack of automation).
Whatever the reason, development teams need something they can control. They can't wait for the infrastructure teams to install the correct version of the libraries they need or to provide a VM. How do they work around this? They ask for a couple of large VMs (multiple vCPUs with a lot of storage and memory) and then stop asking for additional resources. After some time, they return, ask for another large VM and then disappear again. If you're seeing this, you are probably living this scenario.
When this happens, infrastructure teams start wondering how the dev teams create all these new apps and services. The development teams tell them everything runs in-house in our environment, but the infrastructure teams don't see requests to provision of the VMs or resources. That's when we learn they're using those VMs to run containers or microservice architecture platforms.
To understand the risks of Scenario 2, we must understand some basic concepts around containers and container orchestration platforms. There are many container options, but I will limit the description to the Docker containers.
A Docker container is an object composed of multiple layers. All but the topmost layer are read-only, immutable layers. This top layer is where the developer's specific code lives.
Note: a user view can be found here.
Going back to the risks. Think for a minute. If infrastructure runs a vulnerability scan at the VM level and finds dozens of running containers, the scan may not uncover vulnerabilities at the containers level.
Let's say developers deployed a container some time ago and haven't modified the application for a while, so the container has not been updated. Now, if one of those layers happens to have a vulnerability (e.g., libssl), a scan of the VM will not necessarily uncover it. And even if it does, what's the process to patch it?
Are developers responsible for proactive patching of the containers after delivering an app? What about when a new vulnerability is discovered? Should the security team keep track and patch the physical servers and VMs? Who is responsible for tracking and patching those layers and rebuilding the containers?
These are just some of the risks for containers. Sometimes you'll find that development teams have deployed a microservices architecture framework, like Kubernetes, on top of those large VMs they requested a year ago. Perhaps they've created microservices-oriented applications that now run in those VMs. A great feature of the microservices is the ability to protect the microservice and spin up replicas of the containers automatically or on demand.
What are the risks with these frameworks? Besides having the same risks and challenges as containers, in a microservices framework, the "application" is broken into multiple microservices distributed among those VMs. If they are running in VMs, you're probably doing backups of those. Guess what? These frameworks, especially when using them from upstream projects, are not designed to be backed up. Any attempt to restore them will probably fail.
These highly distributed solutions use the concept of ephemeral and persistent storage. They assume the user protects the critical data in persistent storage and everything else is considered ephemeral. Now, if the developers used simple VMs without integration with the infrastructure, there would be no persistent storage to read and recover the data from.
If the development team is our customer, we must uncover the experience they expect from us. This is not an easy task. In many cases, developers may expect an experience similar to a cloud experience. This is often summarized as "give me the resources but stay out of the way."
Let's run with that expectation. How can infrastructure teams provide the right resources but stay out of the way? Here are a few steps to consider:
- Create service catalogs and expose all infrastructure resources as self-service consumable items.
- Identify organizational policies that impact tools and frameworks used by developers.
- Identify the enterprise products supporting those frameworks.
- Map organizational policies to features in these products.
- Does the framework support data-at-rest encryption? Does the organization need it?
- Does the framework support a way to track vulnerabilities and remediate container images?
- Does the framework integrate with my existing physical and virtual infrastructure?
- Does the framework support multi-data center and multicloud deployments?
- Which components do we protect or back up in these frameworks?
- Identify and define integration points between the infrastructure and the microservices frameworks.
- Enable infrastructure automation and orchestration. Make it easier for it to be consumed by microservices frameworks.
- Set up a new set of requirements for any new infrastructure to provide APIs and capabilities that can easily integrate with the microservices frameworks.
This scenario is a combination of the previous two. From the perspective of infrastructure teams, there is no DevOps, so they can start preparing infrastructure for it. At the same time, with cloud-based solutions already widespread throughout the organization, they're late to the game. So, where do we start?
- Track the tools and frameworks. If infrastructure teams are using the upstream versions, or don't have tools for governance and lifecycle management tools, this scenario is a version of scenario 2 above. The enterprise products supporting microservices frameworks can run in the cloud and on premises. These frameworks tend to have a strong hybrid cloud capability. Consider preparing your infrastructure to work with the microservices frameworks in the hybrid cloud configuration. A well-designed implementation of a mature microservices framework allows for the easy transition of workloads between on premises and public cloud.
- Create a cloud experience for your organization. Infrastructure teams need to operate as a service brokerage for the organization. It's not about buying multiple clusters of an OEM solution to achieve redundancy and call it "cloud ready." Delivering the cloud experience should contemplate designing for service availability, even during the failure of components. Think about storage service in a cloud environment. There are disk and node failures behind the scenes, but the service is still there. We must provide that experience in our infrastructure. This is part of what we can achieve by using software-defined storage (SDS), software-defined networking (SDN), or even going with software-defined data center (SDDC) solutions.
- Adopt automation and orchestration for everything in the data center. We must provide the services while staying out of the way.
If you find yourself overwhelmed by a torrent of tools, frameworks and platforms, remember that there is no such thing as buying DevOps. Many tools support agile methodologies and techniques used in DevOps.
DevOps is an outcome-driven culture. Instead of focusing on what to buy, focus on enabling and delivering the experience to your internal customers: the developers and applications teams. It can be a bumpy ride but one that's well worth taking.