Examples of High-impact Automations Across IT Infrastructure
In This Insight
Automation across IT infrastructure plays an important role in helping WWT execute proofs of concept at speed and scale through our Advanced Technology Center (ATC).
In this ATC insight, we dive deep into some of the automations that we've implemented. These are automations that have yielded the highest return on investment when it comes to making our POC process more efficient. While our motivation for implementation comes from improving technology testing, these automations can be applied to any large IT shop looking to gain efficiency, cost savings, time savings and scalability.
A major value-add for our customers is our ability to collapse the time it takes to conduct proofs of concept (POCs) from months to weeks, sometimes even days. We execute hundreds of POCs per year, which means our ATC architects are constantly spinning up testing environments across major components of IT infrastructure: network, compute, storage and software.
As part of our ongoing effort to be more efficient, we ask them to identify common, repeatable infrastructure tasks. We then challenge them to automate those tasks away. This requires them to not only use their existing knowledge of automation but also acquire new skills.
The following are some of the most high-impact automations we've implemented. Anyone who has worked in a large IT shop will relate to the problems we solved for through automation and get an answer to the most important question: How much time does a given automation actually save?
We hope that by having our architects impart their lessons learned, you can begin automating across different technology domains or accelerate efforts in flight.
Network: Adding/Deleting VLANS (Switches, vCenter & UCS Profiles)
Have you ever asked yourself if there was an easy way to add/delete a VLAN to your entire infrastructure stack by clicking a few buttons? This questions came up again and again for me and my colleagues.
When I say the entire infrastructure stack, I'm referring to:
- Switches: Port Profiles and Uplink Ports
- UCS FI's: VLAN, vNIC templates and service profiles
- vCenter: DVS port groups
I used Ansible to bring together tasks that are traditionally separate, from a technology and team perspective. The below screenshot shows the overall template to be deployed. You'll notice there is one for adding and deleting. For any automated process, it's good to have an automated process to clean it up.
Once the job is deployed, it leads you to a simple form that asks you to fill in the VLAN and name of the VLAN.
After this is completed and "next" is clicked, we get into the weeds of what the Ansible playbook is actually doing.
Add the VLAN to our networking infrastructure (This consists of 8-plus switches used in a spine leaf architecture). The first part of this automation creates the VLAN on all the switches, adds the VLANS to our necessary uplinks to our core switches and finally adds the VLAN to the port profiles that we use for UCS further down the network stack. Also, this adds in a copy run start to save the configuration of the network switches as well!
Add VLAN to vCenter consists of creating the port group to our DVS that is assigned to all our hosts in vCenter.
Add the VLAN to UCS Infrastructure consists of a few things and also assumes a few things. First and foremost, you need to have your vNIC profiles and service profiles as updating profiles. If you don't, then you are truly just adding the VLAN to UCS and, as a result, you would have to go back and manually add the VLAN to the vNIC template and service profile template.
Deleting a VLAN works the exact same way, just in reverse order. While I could go through each step of deleting the VLAN, I think you get the gist.
How much time does this actually save?
The time it saves us to have VLAN additions/deletions added to our UCS infrastructure is one of the main reasons we can deliver POCs to our customers so quickly.
Below is an example of how long it takes Ansible to run this automated process.
One minute and twenty six seconds is all it took!
If we had to run this all manually, we would for sure get into a one hour per VLAN scenario. And if we have more than one VLAN to add at a separate time, well, you can do the math. This automation is a great time saver in the daily life of our POC process.
Compute: Vdbench Automated Deployment
By Phil Canman
Deploying an up to date, properly sized and setup worker VM for HCI benchmarking is a time sink by hand, so why would you?
The manual process for setting up the HCI environment for Vdbench testing requires deploying a CentOS 7 virtual machine and then provisioning that VM with all the needed code bits. Then, the VM needs to have properly sized data VMDKs along with networking to start the Vdbench testing. Once we got the template properly setup, we then would have to clone the template anywhere from 12 to 120 times depending on requirements for a POC. Cloning 120 VMs by hand via the GUI is rough, tedious and repeatable, which made this a perfect task to automate.
Ansible to the rescue! Using Ansible, I was able to automate the complete deployment process for Vdbench workers on HCI solutions. I created a playbook that accomplished every task needed from updating the template VM before cloning to changing the host names so each worker VM could be accessed by DNS. Then, I added the playbook to Ansible Tower so users could easily access the tool.
Below is an example screenshot of what is being called when we launch our Ansible template. As noted above, it takes in all the pieces for vCenter (disk size, etc.) and puts them in nice wrapper that any team member can deploy.
How much time does this actually save?
Doing this by hand would take hours to complete. Doing this with automation took 11 minutes, 30 seconds in a recent POC deployment, as shown in the screenshot below.
The automation took a couple days to setup and test, but that time was easily recouped within the first couple of deployments. Plus, this automation removes the human equation from the worker deployment and standardizes the test process even further as we now have a unified way to deploy workers between benchmarks.
Storage: Mapping RDMs to VMs via vCenter
Mapping RDMs to virtual machines takes too long through the vCenter GUI. Often during testing we create a large number of LUNs that we then present up to VMs hosted in VMware. Selecting the VM and then selecting the VM editing settings and attaching each RDM through the GUI is a very long process. Plus, don't forget to put those on different SCSI controllers and buses for maximum performance.
PowerShell and PowerCLI to the rescue! I've never been a big fan of using a GUI as it simply takes too long and prone to user error.
I started out very small, asking: How do I use PowerShell to connect to vCenter? Once I could actually connect to a vCenter, I then learned how to query the ESXi host and scan for newly presented disks.
Next up is formatting. Now that I had a list, I needed to extract all the details I needed to make an informed choice. By bringing in sort-object (built into PowerShell), I could sort the disks into something relevant for me and the script going forward, which is disk size.
This worked well for this situation as all the disks I wanted to mount were the same size. I then used this information to build out the syntax for the "New-HardDisk" command which ultimately adds the RDM disk to the VM. After I got this far, I was beyond excited because I no longer needed to open a GUI to add RDMs to a set of VMs!
With the framework built, the rest of the script evolved over the next couple of months, being tweaked to run with as little input as necessary.
The first screenshot below is snip of the code used. The second is actual code running via PowerCLI.
How much time does this actually save?
Before the automated script process, using the vCenter GUI took hours. Often my colleagues and I would split up the mounting of the RDMs. Now, we can easily mount up 64 RDMs to 8 different VMs all on separate SCSI controllers and buses in about 8 minutes. See below screenshot.
By taking the time to do this automation, my understanding of PowerShell grew immensely. I took that knowledge to immediately create a spin-off script that would unmount all the RDMs after we were done testing. Keep in mind, I am in no way a "developer." The solution was developed over months of hacking away, learning bit by bit.
Software/Application: Oracle Database Installation
By Chris Nugent
A request came in for the need of multiple databases to be used as and endpoint for a product bake-off. The first problem is that I am not a database administrator. Secondly, the need for more than one would multiply the amount of time to deploy these manually and consistently. Plus this is a bake-off between more than one vendor which means that each vendor will need to have the exact same setup to provide a fair test.
Oracle is its own ecosystem with a lot of moving parts that I am not familiar with. This would take me days of research to understand all the parts and what they are used for to walk through the installation.
Using Ansible and with help from the community, galaxy.ansible.com, we were able to get example play-books and roles to help with the installation of the databases. In this case, the ask was for six instances with two running one version and the rest running a different version. Even though these were standalone instances, the request was to use Oracle Grid for the ASM capabilities with five of the databases 2.5TB in size and the other to be 10TB. Since the playbooks had the installation already working, I only needed to work with a DBA to provide some of the specific data that was relevant to our installation requirements. This was beneficial in that I didn't need knowledge around all of Oracle, just the specific pieces related to installation.
Since the build out of the infrastructure was also automated, it makes these predictable environments for us to deploy the software on. With the predictability, we are able to create the ASM disk groups for all the appropriate needs based on the disk layout that was created. We could also use these automation tasks to alter them after the fact as part of a day 2 or ops support task.
Below is a screenshot of the virtual machine disk configuration for one of the Oracle instances.
Below is the configuration file used in the Ansible playbook that created the diskgroups for Oracle ASM
Which version to install
To deploy the correct version of Oracle on which system was done simply by assigning the variable to the inventory host file. The Ansible playbook reads those variables and determines which installation is used for a given system.
Below is a section that determines which version will be installed. This is based on the group variable which is called when executing the playbook.
Normal installation would require a long list of questions to answer for the setup of the software. In this case Oracle has the ability to provide an answer file for the installation. Applying configuration settings recommended by our DBA for each version made installation a breeze. If there were any issues after the deployment, a modification to the text file was all that was needed.
Below is a snippet of a Jinja2 template that is copied out to the system being installed that will be passed in as the response file to the installer for the silent installation. In here are variables which are filled out based on the playbook execution.
How much time does this actually save?
For this particular case, that amount of time saved is hard to measure as the automation was either completed or was already started and just needed some fine tuning. The biggest measurement for this was our ability to turn around the installation between vendors in under a week from tear down to ready for use. Other savings included our ability to redeploy multiple times during that week as there were errors in the underlying infrastructure that required tear down and build a few times. As you can see, having automation provides consistent, reliable and predictable outcomes as well as provides time to work on the next task and to evolve our solutions and products.
Investing in automation can be hard to justify when IT architects and engineers are under so much pressure to keep up with a large influx of new work. It may seem like there's simply not enough time for architects and engineers to pause and ask: Is there a more efficient way to do this?
The good news is, automation can come incrementally. Let architects and engineers identify where they are spending an inordinate amount of time on repeatable, manual tasks. When architects and engineers see precisely how an automation can improve their daily lives, they'll likely find the time to acquire the skills needed to implement that automation. We find that this bottom-up approach leads to automation evolving organically throughout an IT organization.
We hope that by documenting some of our most high-impact automations and being frank about the learning curve we had to overcome, you have more confidence when it comes to automating your own environments.
If you would like more information about anything we covered, you can contact:
- Network: Brian Saunders and Phil Canman
- Compute: Phil Canman
- Storage: Bryan Peroutka
- Software: Chris Nugent
And finally, remember, whatever is repeatable can be automated!
Vdbench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network connected storage. The software is known to run on several operating platforms. It is an open-source tool from Oracle. To learn more about Vdbench you can visit the wiki HERE.
A Graphical User Interface (or GUI) that we use in the Advanced Technology Center (or ATC) to visually depict the results data that we derive in our compute and storage lab efforts. To learn more about this product you can go HERE.