Examples of High-impact Automations Across IT Infrastructure Volume 3
Volume 3 of our automation edition of ATC Insights goes through a few more automation efforts that we use on the Lab Services team to help deliver POC's (Proof of Concepts) faster to our internal and external customers.
In This Insight
This ATC Insight focuses specifically on a few use cases. The first use case is around hardware inventory and the second use case is around deploying Active Directory Domains. Both use cases can be time-consuming and tedious for our architects and engineers on a daily basis. Our intention with these specific automations was to be more efficient and to save time.
Welcome to Volume 3 of “Examples of High-Impact Automation's Across IT Infrastructure”. In our previous Insights (Vol 1 and Vol 2) we shared automation that helped our Lab Services team deliver on POC's (Proof of Concepts) faster. This ATC Insight will continue down the same path. Hopefully through the examples below you will see how we save time around the deployment of virtual machines as well as to the overall build lifecycle of a POC.
Inventory Process - (aka Braunventory)
by: Phil Canman
While my name graces the inventory process Phil Canman is the script creator and all-around automation expert that is able to deliver the process in a seamless fashion. Let's start with the problem.
When delivering POC's we almost always get asked what is in the server? How much RAM is installed? What PCIe cards are needed? 10/25 GB? Local HDDs? CPU Cores/Speed?
For reserving gear we have an in-house system called AMS (Asset Management System). What this system allows us to do is reserve different pieces of gear for POC's. We create a reservation that is usually the same name as the POC # and we add each Asset id to the POC.
In doing it this way it makes sure that way nobody can use the asset while it is tied to the reservation. The problem with this system is that while it clearly defines what the server is being used, like showing that you can reserve a C240 M5 it doesn't show any of the hardware details around what is in the server. This makes it extremely difficult to reserve the right asset for the right use case on the first try. Sometimes we have to call up a few assets before we get one that will fit the needs of the customer/ask. For instance, if you have a server called up from the warehouse there is no telling what RAM is installed, or what CPU is installed. This can be extremely time consuming as we have to get a warehouse engineer to load the server on a truck and bring the server up to the ATC. Once up to the ATC an engineer can do a physical inspection which for sure will get you the total memory, HDD's, PCIe cards, but it will not get you the CPU info unless you boot it up and either watch it post or put an IP address on the CIMC, IDRAC, or ILO of the server.
How did we solve this problem?
The problem was solved with a two-pronged approach. The first approach to the problem was to find the easiest way to get hardware details out of the server with a script that was not vendor biased. We were able to create a PXE environment that allowed us to boot any server to a Linux shell which then used Linux commands to show the output of the inventory contained in the system. To tie the assets together, Linux was able to get the serial number of the system. Then, it could correlate the serial number with what was in our Asset Management System (AMS) tool via API. These commands then updated the corresponding asset in AMS with all the hardware details.
While this solution was working really well for all different brands of servers there is a lot of manual effort involved with PXE booting each server in order to get the hardware details out of it. This allowed our ATC DC OPs engineers to boot up the server before getting it on a truck and wasting gas, but the process was still very manual. How can we save more time for everyone?
Prong two, or the second approach built upon the first automation. Now that we got the inventory working with single assets, we started looking at the bigger picture of how we could inventory servers that are online for other Proof of Concepts (POCs). The scenario here is when we cannot shut the server down to do the PXE process. We needed to get these systems inventoried online with no downtime.
The solution was to start using the OEM branded tools such as OME, HPE OneView, and Cisco Intersight. By adding the systems to the OEM Management tools, we created one spot to query for each hardware vendor's servers. While in the PXE boot example we can do one server every 7 minutes. We were was able to query the OME solutions via the API and pass that info over to our AMS system. Now we were updating 100+ server inventory in just a few minutes.
While we currently have this working on Dell (OME) and HPE (OneView) our plans include incorporating Cisco Intersight and UCS Central/UCS Manager.
Jump box deployment in Flash Lab
by: Phil Canman
We use hundreds of jump boxes in the ATC for our Proof of Concept and lab work. Deploying jump boxes in the flash lab where a lot of our testing environments are built is not a hard task, it is more of a time suck. First, you clone your windows 10 template to your pre-made POC folder, then set a static IP or rename the OS with a DHCP IP address so you can use DNS. I think it is safe to say a single jump box can take around 10 minutes to fully set up correctly. Then there is the issue with naming conventions and folder structure being all over the place with hand jamming the process. This leads to a lot of questions on who owns the jump boxes and can they be deleted down the road.
Well, you guessed it… It is automation. We created a small Ansible playbook that makes the folder structure, deploys the jump boxes, names the VM's per the naming conventions, and changes the windows OS name to that same naming convention so we can use DNS to add the RDP sessions in the POC portal. This is all accomplished via a short Ansible tower survey.
Then you click next, sit back, and have a shot of your favorite hard alcohol because you won't have time for a beer. Below is a screenshot of the folder and VM naming convention being used. The folder name gives us all the information we need to track down the POC and the POC owner to verify if it is still in use. The VM name is the DNS name you would use to address the VM.
For example, you can now RDP to “9999-PC-1.wwtpoc.local” to access the OS. All you need to do now is add the jump boxes DNS name in the POC portal page and you should be up and running with customer-facing jump boxes in a matter of minutes.
As you can see from the screen snippet below the whole process took 84 seconds to complete.
This whole setup from start to finish would have taken me around 30 to 45 minutes to complete. Now as you can see from above it was completed in less than 2 minutes. That frees up time to do other important POC related tasks in our engineers' and architects' days.
As we have stated in our other automation-specific ATC Insights, automation takes time and the willingness to think outside of the box to save your entire organization time. It takes time to build automation. It doesn't happen overnight. The overall goal of any automation effort is to save time for every person in the organization. And finally, remember, whatever is repeatable can be automated!