DevOps: A Survival Guide for Infrastructure Teams
Learn how infrastructure teams can start being part of DevOps efforts in their organizations.
A TSA's take beyond tools and buzzwords
DevOps means something different to different people. It is one of those words that gets pulled in many directions by everyone trying to sell something - sometimes to such an extreme that it becomes meaningless because it has been stretched to encompass everything.
To avoid the buzzword trap and start addressing how infrastructure teams can be part of DevOps efforts in their organizations, let me explain my take beyond tools and buzzwords.
We can think of DevOps as the evolution of the revolution that started when our own processes became the main obstacle affecting our ability to deliver results and respond quickly to the changes in the market.
What is DevOps?
In its purest form, what is DevOps? DevOps is an outcome driven culture. A culture focused on the agility to provide fast results while retaining quality. Where is technology in there? Where are the developers? Where is the operation team? Where is the infrastructure team? The reason why there are no tools or individual groups in the definition is because it is a culture. A culture is the characteristics and knowledge acquired by a group of people during generations through individual and group endeavoring.
So, what’s with all the other definitions around like “developers and operations groups working together” or “developers with operations accountability;” and “infrastructure as code,” “using automation for repetitive tasks,” or “abstraction of the infrastructure.” You’ve probably even heard “using Kanban or SCRUM methodologies” or “using microservices architectures.” Are those definitions wrong? No, each one is correct for a different group.
Going back to the definition of a culture. Our environments, experiences and needs are different. Consider a New Yorker or a Texan, both part of the same American culture but when you commute in New York you walk, take a taxi or the subway and say, “I’ll be there in 20 minutes.” Meanwhile, in Texas, you will drive your own truck and measure distance in hours. Both groups, part of the same culture, have adapted a common process to their realities.
This is the same that happens with DevOps in an organization. The specific definition and care about is based on the realities and the needs of the specific group.
In a traditional organization adopting DevOps agile methodologies, we have two main groups—development and infrastructure teams—that normally don’t work together, but each having to collaborate to achieve an outcome. The adoption of DevOps methodologies for each group will drive different efforts but in the end, they’ll have to work together. The adoption of agile methodologies with key elements like open and direct communication, collaboration among teams and within the teams, and automation drive the foundations of a DevOps culture.
Eventually, as part of the evolution of the DevOps culture inside an organization, a new group emerges that becomes the translation bridge or liaison between the developers and the infrastructure teams (illustrated above).
This new group consists of individuals from the development and infrastructure teams that understand enough of the other group that they can map the requirements into common services platforms.
To illustrate this point, consider the scenario of a development group asking for Docker or Micro Services support in the organization. What does that mean for the storage team? How can storage admins support that? What does that mean for the networking team? How can network admins support that?
Now, consider the reverse scenario where the IT team has to enforce compliance. How is data-at-rest encryption enforced in Docker? How are backups managed in Micro Services architectures? How are the organization’s investments in infrastructure relevant? These are the kinds of questions that don’t have a straight out-of-the-box answer. The answers to these questions require a fair understanding of both sides to be able to identify a balanced solution.
Over time, as expertise is built in the organization, members of both groups converge around a common language, which allows for easier communication and faster response from the infrastructure team enabling new services and faster adoption of services from the development teams.
This does not happen overnight, so where do we start?
Infrastructure teams: Enablers of service platforms
With organizations adopting DevOps visions, the role of the infrastructure teams is to become a service provider for the internal teams. The infrastructure team must deliver the service but at the same time be out of the way of the consumers of the service. In the end, the infrastructure team must be invisible to the development teams. There is no magic wand to make this happen. This is not a single step process but a journey.
As we can see in the diagram, everything starts with the adoption of agile IT. Once combined with collaboration, it moves into more DevOps-oriented phases.
DevOps adoption stages
Let’s take a look at several scenarios to see how the infrastructure teams can work towards achieving this.
Scenario 1: My organization does not have a DevOps strategy, but we want to prepare for it.
Under this scenario, we start with the adoption of Agile IT. The agile methodologies and principles that form the Agile Manifesto used by developers does not have a direct translation into IT but it does provide a good idea on what they expect from infrastructure teams. For example, rephrasing the first two principles of the Agile Manifesto:
- Our highest priority is to satisfy the customer through early and continuous delivery of valuable services.
- Welcome changing requirements - agile processes harness change for the customer's competitive advantage.
That is what is expected from the infrastructure team. How do we get there? The developers are our customers.
- Start working with the automation of repetitive tasks. When defining automation tasks, follow best practices for configuration and hardening of the specific task. For example, if you are in storage, automate the provisioning of storage and the definition of the access rules. If you are in networking, automate the provisioning of VLAN, ports or BGP sessions following best practices.
- Start working towards achieving infrastructure-as-code (IaC). Orchestrate the automation tasks into workflows that deliver consumable resources (compute, storage, network) with consistent and predictable results. Note that this is not only about virtual environments; it is both physical and virtual resources.
- Adopt Software Defined Everything (SDx). The software defined data center provides the organization with the ability to be agile and to adapt to the ever-changing requirements.
- Enable abstraction of the infrastructures. Enable APIs, especially integration with RESTful API interfaces. Think of APIs as the hooks or venues to provide on-demand consumable resources. Platforms and tools higher in the stack consume APIs. If you have traditional enterprise equipment, most probably this is already supported. If not, contact your hardware or software provider as most OEM’s now provide APIs for integrations and extensibility. WARNING: Don’t forget to secure the access to these APIs.
- Enable the ability to consume infrastructure resources over self-service portals or service-catalogs. This goes back to “getting out of the way.”
Scenario 2: Developers went rogue using modern techniques to simplify their work without telling the infrastructure teams.
This might be one of the more complex scenarios. The infrastructure team finds themselves playing catchup with a widespread utilization of technologies that “don’t need the infrastructure team.”
This scenario starts when development teams find themselves restrained from going to a cloud provider, while at the same time lack a viable solution in-house. Their perception is the infrastructure team is too slow. Many times, it is the result of the lack of automation.
Whatever the reason, the development teams need something they can control. They can’t wait for the infrastructure teams to install the correct version of the libraries they need, or to provision a VM for them. So, how do they work around this? They ask for a couple of large VMs (multiple vCPUs with a lot of storage and memory) and then they stop asking for additional resources. After sometime they come back and ask for another large VM and then disappear again. If you are seeing this, you are probably in this scenario.
When this happens, the infrastructure teams start wondering how is it the development teams are creating all these new apps and services. The development teams tell us everything is running in-house in our environment, but we don’t see requests for the provisioning of the VMs or resources. That’s when we learn they are using those VMs to run containers or micro service architecture platforms.
What are the risks with this approach?
To understand the risks, we need to understand some basic concepts around containers and container orchestration platforms. There are many container options, but I’m going to limit the description to the Docker containers.
Docker containers architecture
A Docker container is an object comprised of multiple layers. All but the top most layer, are read-only immutable layers. This top layer is where the developer’s specific code lives.
Note: a user view can be found here.
Going back to the risks. Think for a minute, if we do vulnerability scanning at the VM level and they’re running dozens of containers, our scan may not be uncovering vulnerabilities that exist at the containers level. Let’s say developers deployed a container some time ago and they haven’t modified the application for a while so the container has not been updated. Now, if one of those layers happens to have a vulnerability (i.e., libssl), a scan of the VM will not necessarily uncover it. And even if it does, what’s the process to patch it? Are developers responsible for proactive patching of the containers after delivering an app? What about when a new vulnerability is discovered? Should the security team keep track and patch the physical servers and VMs? Who is responsible for tracking and patching those layers and rebuilding the containers?
That is when it is just about using containers. Sometimes you’ll find development teams have done their own deployment of a micro services architecture framework like Kubernetes on top of those large VMs they requested a year ago. They create microservices oriented applications and now those run in those VMs. A great feature of the microservices framework is the ability to protect the microservice and spun replicas of the containers automatically or on-demand; and protect those from a VM or host failure.
What are the risks with these frameworks? Besides having the same sort of risks and challenges as containers, in a microservices framework the “application” is broken into multiple microservices distributed among those VMs. If they are running in VMs, you’re probably doing backups of those. Guess what? These frameworks, especially when using them from the upstream projects, are not designed to be backed up. Try to restore them, and it will probably fail. These highly distributed solutions have the concept of ephemeral and persistent storage. They assume the user protects the important data in persistent storage and everything else is assumed ephemeral. Now, if the developers used simple VMs without integration with the infrastructure, there is no persistent storage to read and recover the data from.
How can the infrastructure teams take control and support this environment?
The development team is our customer. We need to uncover what is the experience they expect from us. That is not an easy task. In many cases, the expected experience may have many similarities to a cloud experience. Often, it is summarized in “give me the resources but stay out of the way.” So, let’s go with that one. How can we provide the resources but stay out of the way?
- Create service catalogs and expose all infrastructure resources as a self-service consumable item.
- Identify organizations policies that impact tools and frameworks used by developers.
- Identify the enterprise products supporting those frameworks:
- Map organization’s policies to features in these products:
- Does the framework support data-at-rest encryption? Is it needed by the organization
- Does the framework support a way to track vulnerabilities and remediate container images?
- Does the framework integrate with my existing physical and virtual infrastructure
- Does the framework support multi-data center and multicloud deployments?
- Which components do we protect or backup in these frameworks?
- Identify and define integration points between the infrastructure and the microservices frameworks.
- Enable infrastructure automation and orchestration. Make it easier for it to be consumed by the micro services frameworks.
- Setup a new set of requirements for any new infrastructure to provide APIs and capabilities that can easily integrate with the microservices frameworks.
Scenario 3: The organization has a DevOps strategy but they went to the cloud.
This scenario is a combination of the previous two. From the perspective of the infrastructure teams, there is no DevOps so they can start working towards preparing the infrastructure for it. But at the same time, the cloud based solutions are widespread throughout the organization, so, they are late to the game.
Where do we start here?
- Track the tools and frameworks. If they are using the upstream versions, or do not have tools for governance and lifecycle management, this scenario is a version of scenario #2.The enterprise products supporting microservices frameworks can run in cloud and on-premise. These frameworks tend to have a strong hybrid cloud capability. Consider preparing your infrastructure to work in the hybrid Cloud configuration with the microservices frameworks. A well-designed implementation of a mature microservices framework allows for the ease transition of workloads between on-premise and public cloud.
- Create a Cloud experience for your organization.The infrastructure teams need to become a service brokerage for the organization. It is not about buying multiple clusters of an OEM solution to achieve redundancy and call it cloud ready. Delivering the cloud experience should contemplate designing for service availability even during failure of components.Think about storage service in a cloud environment. There are disk failures and even node failures happening behind the scenes but the service is still there. We must provide that experience in our infrastructure. This is part of what we can achieve by using software-defined storage (SDS), software-defined networking (SDN), or even going with software-defined data centers (SDDC) solutions.
- Adopt automation and orchestration for everything in the data center. We must provide the services staying out of the way.
A bumpy ride
If you find yourself overwhelmed by a torrent of tools, frameworks and platforms, remember, there is no such thing as “buying DevOps.” There are many tools that are used to support agile methodologies and techniques used in DevOps cultures, but this is not about Docker, Jenkins, Kubernetes, OpenStack or any other buzzwords. DevOps is an outcome driven culture. Enjoy the journey. Do not focus on “what to buy,” but rather on enabling and delivering the experience to your internal customers: the developers and applications teams.