A Systems Administrator's Introduction to Automation

As I thought about the name for this article, the idea of teaching an old dog new tricks kept coming to mind — followed by the second notion that I'm the old dog. My background started in the help desk and eventually evolved to the title of systems administrator, with skillsets in data center servers, storage, virtualization and networking. The idea of a private cloud is well known in this space, but it may be hard for engineers that grew up in this monolithic data center to branch out into new areas such as data center DevOps, as these areas are usually seen more in public and hybrid cloud scenarios. I recently had an opportunity to dip my toe into the world of enterprise automation and I wanted to share my experience. This article is for those engineers with a similar background looking for guidance to get started with automation.

Enterprise automation for data center infrastructure

Automation is formalizing some of the things I was already doing. I had done a bit of scripting over the years to help me with administrative tasks, but it didn't go past that. I was creating tools for my own consumption but not integrating them into the data center as a shared, accessible and repeatable offering. The idea of enterprise automation is exactly that, allowing users or an event to launch reusable code that can execute a complex list of tasks. Through automation we can achieve measurable and quantitative value to the business. Rather than just performing manual tasks as part of "keeping the lights on," formal automation allows us to record the efficiencies gained by our work. With the time saved on not performing manual tasks, look to recognize other areas to automate and continue to achieve real business value and not just technical value.

Having recently started this journey on the ground floor, here are the key areas that will get you started:

Find an attainable first use case.
Assess your skills and the automation tools available for your use case.
Create and test the API calls for each task.
Pull it all together in Ansible.

The use case

The first thing is to identify a use case of what you are looking to automate. Start with something easy and attainable for your first mission. If you don't have a good use case, then create one yourself in a lab. VMware vSphere can be deployed as a trial license to provide an environment to automate against. My use case dealt with exactly that, a VMware vSphere environment that I needed to automate restoring a virtual machine snapshot and powering the VM back on.

Assess your skills

Now that you have a use case, assess what coding is available to you for what you are trying to automate. In my case I was automating inside VMware vCenter, so I researched what VMware has available for this. Within my comfort level was their REST API and the VMware PowerCLI for PowerShell. I would have been able to accomplish my tasks using these two APIs but I also looked at Ansible integrations since this would be the tool running my automation.

Ansible makes use of "collections" within their product to import vendor or community driven code. Collections are a way to speed up development by allowing you to install prebuilt code into an Ansible project. Many collections exist for mainstream products and are either developed and supported by the vendor or by the community. A quick web search revealed there is a VMware collection for Ansible. I read through documentation and found that I could accomplish my tasks with the collection.

Create and test your code

While I chose to use the Ansible collection for VMware which has most of the code already defined, I wanted to take advantage of this learning opportunity to try different methods. REST is a very popular API, and I wanted to be comfortable with accomplishing my tasks with REST. I began researching how to make REST calls to vCenter. Right within the vCenter web interface there is a developer's menu that walks you through everything exposed in their REST API.

I found that Postman is a useful tool to test your API calls. With a lot of research and trial and error I was able to create API calls that authenticated a session and queried the numeric VM ID based on the virtual machine name. It is very easy to make REST calls within Ansible since you are just sending and receiving web requests to a URL. I reviewed the documentation for creating REST calls in Ansible, so I was comfortable doing this if a future project required it.

Putting it in Ansible

I will admit this is the most intimidating and difficult part as everything in this step was new to me. Ansible can be installed on any major Linux distribution for free and can be used as the next step to get your code running. Ansible works by reading files created in the YAML, abbreviated YML, file format to do its work. I looked up sample file after sample file to get comfortable with the syntax. As a good starting point, first create a simple YML file that completes a basic task using the REST API that you previously tested with in Postman and get a working playbook before moving onto multiple tasks and the VMware collection.

The Ansible software that is the key product of enterprise automation is a paid product called Ansible Tower. There is also an open-source version of Tower called the AWX project. If your organization does not have Tower, look into deploying AWX to get things started. Tower and AWX present a web interface that allows for playbooks and projects to be built and executed from. Tower also hooks into GitHub so that GitHub is the single source of truth for code and version control. I uploaded my YML file to GitHub and connected it to my Tower project. Now whenever Tower needs to run my code, it can check GitHub to make sure it is running the latest revision.

Another thing that Tower and AWX allow you to do is make further use of variables in your code. You can define and use variables within your playbook, but you can also use built in environmental variables within Tower. This is useful especially with credentials. Rather than defining credentials in plain text in a playbook, I can define them in my Ansible project through the web interface. You will notice in my code below that I never define credentials in the code. This is because I have defined a "VMware vCenter" credential type on my Ansible project through the web interface. At runtime Ansible exposes the credentials as variables.

The finished product

Ultimately there is a lot more to cover than can be put into a short article. So far, I have outlined the steps I took to arrive at a finished playbook. Now we will get into the technical parts of my playbook to learn how it works.

---
- name: OME Lab Start
  hosts: localhost
  gather_facts: no
  vars:
    vmware:
      host: '{{ lookup("env", "VMWARE_HOST") }}'
      username: '{{ lookup("env", "VMWARE_USER") }}'
      password: '{{ lookup("env", "VMWARE_PASSWORD") }}'

The 3 dashes at the beginning show the start of the YML document. Each section of a YML document uses dashes and spaces to keep sections together. Spaces are very important in YML. A playbook can have one or several plays. For simplicity my playbook just has one play, to revert the VM snapshot and power it on. In a more complex playbook, you may have 1 play that targets web servers and executes a series of tasks, and another play that targets the DB servers to execute different tasks.

The hosts entry tells Ansible where the code will execute from. In my case, the local Ansible host can run the code. There may be a situation where you need a remote Linux host to run the code, in which case you can define that host in the playbook. The variables I am defining are the variables that Tower uses when I input my vCenter credentials into Tower. Other variables can be defined in your Tower project and used later in tasks.

  tasks:
  - name: Power off VM
    vmware_guest:
      hostname: "{{ vmware.host }}"
      username: "{{ vmware.username }}"
      password: "{{ vmware.password }}"
      validate_certs: no
      name: "{{ vm }}"
      state: poweredoff
    delegate_to: localhost

After declaring my host and variables, we drop into tasks. From here on every code section starting with a dash will be considered an individual task. This allows Ansible to know what tasks succeed and what tasks fail. It also allows us to pass output to the user or to another task. The module "vmware_guest" is from the VMware collection. I looked at the documentation for this method to determine the variables required. I am passing the vCenter credentials and also the variable "vm" that I defined in my project on Tower. The variable contains the name of the VM I want to power off.

  - name: Restore snapshot
    vmware_guest_snapshot:
      hostname: "{{ vmware.host }}"
      username: "{{ vmware.username }}"
      password: "{{ vmware.password }}"
      validate_certs: no
      name: "{{ vm }}"
      datacenter: "{{ datacenter }}"
      folder: "{{ folder }}"
      state: revert
      snapshot_name: "{{ snapshot }}"
    delegate_to: localhost

This is the task to restore a VM snapshot. This task uses the "vmware_guest" module and I also pass the vCenter credentials it needs. This module requires a bit more information to run. The name of the VM is the same as the previous task. The module needs to also know the Data Center and VM Folder name from vCenter to target the VM. I also pass these to my playbook using variables that I define in Tower. The last lines tell the module to target the snapshot and that I want to revert to it.

  - name: Power on VM
    vmware_guest:
      hostname: "{{ vmware.host }}"
      username: "{{ vmware.username }}"
      password: "{{ vmware.password }}"
      validate_certs: no
      name: "{{ vm }}"
      state: poweredon
    delegate_to: localhost

This is the last task of my playbook to power the VM on after reverting the snapshot. Using the same module, the steps are similar to my first task. The only difference is that I want to change the VM to the "poweredon" state. Now that the workbook is complete, I verify that the code is uploaded to the GitHub repository. I instruct my Tower project to use my repository as a code source.

I integrated my playbook into Ansible Tower where it can be called by a user or event. In my case, my playbook runs every time a user finishes a lab and Tower emails me to let me know if it was successful or not. By automating this function, the lab is no longer dependent on me to perform a manual task each time the lab is used. I can now measure how much time is saved each time the playbook runs and show value for my efforts.

I hope you enjoyed this article as much as I enjoyed creating my first Ansible playbook. Now that I've had success, I am looking forward to automating my next use case. When you succeed in your first automation endeavor, I hope you also feel the need to "automate everything."