Converting Azure Machine Images and Setting Up Azure Site Recovery (ASR): Lessons Learned
This article describes lessons learned when converting Azure Machine Images (AMI) to run on vSphere (VMDK), as well as setting up Azure Site Recovery.
Part of my job here at World Wide Technology is educating our customers by building innovative labs in the WWT Advanced Technology Center (ATC). The latest lab we’ve been working on is taking a 3-tier application (database, application server and web server) running in Azure, converting them to virtual machines and finally, getting them to use Azure as a Disaster Recovery resource.
Before we get into the issues we ran into, I think it’s only fair to let the reader know that this is not a walkthrough or a tutorial on setting up Azure Site Recovery (ASR). There are plenty of those already written, and I likely read most of them while working on this lab. This article will describe the issues we ran into to help technical folks avoid the same pitfalls.
Converting VHD to VMDK
The first step needed for the lab was to convert the VHD files to VMDK. The process was pretty straightforward as we downloaded the VHD files then used QEMU to convert them to VMs. We chose QEMU after researching several solutions, and it seemed like a good fit because most of the team uses a Mac. There were a few other options but they involved buying a license. The conversion went smoothly, and we booted up the VMs in vSphere.
The 3-tier app in this lab consists of a Windows server running SQL for the database and two Ubuntu servers acting as application and web servers. The Windows server booted up without issues, and we applied our KMS changes to the VM. We then turned our attention to the Ubuntu servers. We opened the console on the web server, and it wasn’t booting. The VM displayed no error messages, just a blinking cursor. We checked the application server and noted the same behavior. Based on previous experience, we remembered we had to remove the Linux OMS agent from Azure AMIs to allow them to boot on vSphere. Once we removed the agent by running
&& sh onboard_agent.sh –purge they booted up just fine, and the login screen presented itself.
The console messages problem
We began to type our username and password and were interrupted by a console message. We hit enter a few times and started typing the username and password and once again got interrupted by a console message.
Upon further inspection, we found the VMs configured with console=ttyS0 in grub. Azure uses a serial port on their VMs, and since the VM running in vSphere did not have a serial port attached to it for console messages, it was displaying them on the console screen. We had two options to fix it. We could add a serial port to the VM or modify GRUB.
Since we don’t like adding unnecessary hardware to a VM, we decided to change GRUB. If anyone who is reading this needs to know how to do this, here is how we fixed it. We were able to type these commands while being interrupted by console messages, but it may be easier to mount the disk on a different VM.
vi /etc/default/gruband override GRUB_CMDLINE_LINUX_DEFAULT and remove any reference to console=ttyS0.
- Verify /boot/grub/grub.cfg has your changes.
*DISCLAIMER: Do not implement these commands on a production machine without properly testing them in your environment.
Setting up Azure Site Recovery
The setup of Azure Site Recovery was reasonably straight forward but as any administrator knows, reading the documentation always reveals the unexpected (more on this below). We quickly had vCenter registered in the Azure portal and built the configuration server as described in the Microsoft documentation. We promptly had the Windows VM replicating to Azure, but the Linux servers required a bit more effort due to the agent installation.
The administrator can enter credentials in ASR for ASR to automate the agent installation. The catch is the agent installation script will only try the root user. We were using Ubuntu, which by default, disables logging in as root. We had to decide if we wanted to enable root or install the agents manually. Due to the security risks, we chose not to use root and perform the installation manually. The Microsoft doc does an okay job of walking through the Linux agent install but it is confusing, so let’s do a recap of what we did for clarity if anyone is having issues.
- Sign in to the config server and open a command prompt as administrator.
- A passphrase needs to be created so run genpassphrase.exe -v > (filename.passphrase).
- Grab the installer files for the operating system you are working with located at %ProgramData%\ASR\home\svsystems\pushinstallsvc\repository.
- The actual filename will vary, but it should look similar to Microsoft-ASR_UA_188.8.131.52_UBUNTU-16.04-64_GA_22Oct2019_release.tar.gz.
- SCP the installer file and passphrase to the desired VM.
- After untarring the install file, run
./install -d /usr/local/ASR -r MS -v VmWare -q(the install location can be modified, but this is where we put it. Also, note the upper case VmWare in the final switch).
- Finally run
/usr/local/ASR/Vx/bin/UnifiedAgentConfigurator.sh -I <IP address of the config server> -P /usr/local/ASR/yourpassphrasefile.passpphrase
The Azure portal should now show the VMs under replicated items.
Failover and reprotect
We were now back on track and had our VMs replicating to Azure, so we did a test failover without any errors. The test failover and actual failover went flawlessly, but that was the easy part. A test failover spins up a copy of the vms in Azure but does not power down the on-premises vms. A failover actually powers down the on-premises vms and commits the failover to Azure. We now needed to do a reprotect, which is reversing replication from Azure to on-premises. We found reprotecting was more challenging.
We clicked the reprotect button and were met with several errors. We needed VPN connectivity back to our on-premises data center and a Linux config server to reprotect the Ubuntu VMs. What? The documentation didn’t mention any of this… or did it?
As we pulled the documentation back up and began reading the requirements for failback, it does indeed require a VPN and Linux VMs require a Linux config server. A reinforced lesson: always read all the documentation. The VMs were now stranded in Azure until these requirements are met.
We spoke to one of our cloud experts, and they helped us get Azure ExpressRoute setup. We then set up the Linux config server by following the documentation provided by MS. After all the failback requirements were met, we successfully reprotected the VMs and ultimately failed them back to their original production location.
A few of the lessons we learned while doing our first set up of ASR are as follows:
- Always read all the documentation before getting started.
- Remove any agents from Linux AMIs.
- Failing over to Azure is easy, but failback is a bit more complex as it requires VPN connectivity.
- Linux servers require a Linux config server for failback.
Feel free to leave a comment below — I’d love to hear your feedback or your own lessons learned.