NetApp BlueXP Disaster Recovery Technical Guide
In this blog
Currently, the NetApp BlueXP Disaster Recovery is in public preview. Fortunately, WWT has direct access to the public preview of this product, and we have utilized this preview to provide a technical guide to this new DR-as-a-service offering, built into the NetApp BlueXP unified management platform.
In this technical guide, we will cover the following items:
- Prerequisites required - VMware, NetApp ONTAP, and AWS FSx for ONTAP
- Deployment and setup within NetApp BlueXP
- How to set up and use BlueXP Disaster Recovery Replication Plans
- Walkthrough the "Test Failover" and highlight what steps are required for a true DR scenario
The BlueXP Disaster Recovery product is readily available inside of any NetApp BlueXP account. If you do not have a BlueXP account, you will need to set one up. Luckily, it is easy to establish this directly from here. There is also a step-by-step guide to get started with BlueXP here.
Once you have set up your BlueXP account and established your authentication, you will then need to have a BlueXP Connector. The Connector is a critical component, and it is suggested it be deployed in AWS. A Connector in AWS is the orchestration unit for all the BlueXP Disaster Recovery environments. The Connector will be deployed in your AWS VPC and is expected to have network connectivity (over HTTPS) to your VMware cluster on-premises (OnPrem), your OnPrem ONTAP storage system, your VMware Cloud deployment in AWS, and the AWS FSx for ONTAP file system. The Connector then communicates directly with the BlueXP platform to provide a seamless single control plane methodology.
With your BlueXP account established and the Connector deployed in AWS, you now must focus on your OnPrem production configuration (Note: this could also be a production VMware environment in the public cloud). BlueXP only works if your VMware cluster is backed by NetApp ONTAP storage, on both ends (source and target). Also, you must be serving VMware data stores directly to your virtual machines (VM) by way of NetApp volumes. Currently, we are focusing on NFS datastores inside of your VMware environment. If you are at the starting point with your OnPrem VMware cluster backed by ONTAP storage, please utilize this guide to get your ONTAP volumes mounted inside VMware and attached to your VMs.
The current deployment of BlueXP Disaster Recovery utilizes the capabilities of VMware Cloud in AWS (VMC) and its ability to attach to AWS FSx for ONTAP volumes as native external NFS datastores. If you are new to VMC, you can utilize this guide to help get your VMC and software-defined data center (SDDC) setup.
Once you have VMC set up and the SDDC deployed, you can then utilize AWS FSx for ONTAP as an external datastore. The FSx for ONTAP file system will be deployed within your AWS account/subscription. The volumes inside the filesystem can be used for any multiprotocol file or block storage projects you have in AWS while utilizing specific volumes as NFS datastores inside of VMC. There is a very thorough guide to integrating your VMC environment with FSx for ONTAP, you can find this detailed guide here.
The last piece of the prerequisite configuration needed is a replication relationship between the source ONTAP cluster and the target ONTAP environment. In our scenario for this technical guide, the source ONTAP cluster is an OnPrem cluster, and the target ONTAP environment is FSx for ONTAP.
There are several ways you can set up replication between ONTAP volumes. We will highlight two options below.
- The easiest method to set up replication between OnPrem ONTAP and FSx for ONTAP is directly through the BlueXP unified control plane. This utilizes the "Drag and Drop" functionality of BlueXP's storage canvas, where you drag the source cluster (OnPrem) on top of the target cluster (FSx for ONTAP). You can see a screenshot of this functionality below.
- If you encounter issues with the replication setup process above, you always have the option to set this up manually through ONTAP CLI. The SnapMirror replication configuration process is well documented. Please remember any ONTAP CLI replication configuration has to be initiated on the target cluster (FSx for ONTAP).
- SnapMirror relationship is also automated during the replication plan creation for volumes that do not have any prior SnapMirror relationship.
- Note: Cluster and SVM peering must be configured for this functionality, but this requirement will be removed in the upcoming release.
Note: If you have already deployed BlueXP Disaster Recovery, please skip to the Replication Plans and Resource Groups section of this technical guide.
Once you have completed all of the prerequisite tasks you can move to the actual deployment of BlueXP Disaster Recovery. The main dashboard for Disaster Recovery can be found inside the "Protection → Disaster recovery" section of the left navigation bar inside BlueXP. Please reference below:
- To get started select the "Sites" option.
- Inside the Sites control window, you will have a single option to add a site. Click "Add" to begin the process.
Note: If you choose the "Dashboard" option you will be prompted to set up your sites first. The dashboard is useless until you have your sites and replication plans configured.
- Once you have your site configured you will have the option to "Add vCenter".
The BlueXP Connector is communicating directly with your vCenter environment at this point, via HTTPS.
- In order to add your vCenter environment you must choose the Site that it is associated with, the Connector you want to connect from, the vCenter IP address or hostname, and the username and password used to login to vCenter. Please use the screenshot below as a reference point.
Note: The "Use self-signed certificates" checkbox is selected by default, if you already have authoritative certificates for your vCenter environment, you should uncheck this checkbox.
- You will now want to continue the process above for all other sites and vCenter servers you would like to use with BlueXP Disaster Recovery.
This is the most basic deployment needed to continue with replication or migration inside of BlueXP Disaster Recovery. Please remember that your vCenter environment must be backed by ONTAP storage in order to continue with disaster recovery replication plans.
If you receive an error or a failure while attempting to add your vCenter server(s), you can use the "Job monitoring" tab to further investigate any failures.
Establishing replication plans and resource groups is the most important part of a working BlueXP Disaster Recovery environment.
The replication plan is creating your disaster recovery for your source VMware environment to your target VMware environment. Please remember that these environments can be OnPrem or public cloud VMware services. Utilizing the "Replication plans" tab in the top navigation menu inside of BlueXP Disaster Recovery. Everything inside BlueXP Disaster Recovery is based on a wizard creation model. As you create your replication plan through the wizard, you can utilize the step-by-step guide published inside NetApp Docs to assist you with your creation. We will not spell out all the individual steps here, but we will call out some focus areas.
The most important part of the replication plan creation through the wizard is the "Applications" and "Resource mapping" windows. The applications window is where you group your resources together or choose individual VMs to migrate or replicate with BlueXP Disaster Recovery. The area to call out when choosing the VMs or Resource Groups is whether they are routed to a network inside of your vCenter environment and if they have access to a datastore that is configured from an ONTAP volume. Lastly, if you do not have resource groups established, you will be forced to establish a resource group for each replication plan you create. These resource groups will remain configured inside of BlueXP Disaster Recovery should you delete the replication plan.
As you progress in the creation wizard, you must map all your resources from the source vCenter cluster to your target vCenter cluster. These mappings include compute resources, virtual networks, virtual machines, and datastores. You cannot proceed through the replication plan wizard without these mappings being accurate. Below are the aspects that need to be reviewed during "Resource mapping":
- Compute resources - where you define and confirm your source and target vCenter clusters
- Virtual networks - define which networks to deploy to from your source and target vCenter clusters
- Virtual machines - where you define your vCPU count, RAM quantity, and IP details for your VMs, plus provide username and password information as needed
- Datastores - define the recovery point objective (RPO) in minutes and the number of recovery points for your datastores
The mappings mentioned above must have a "green" checkbox before you can continue in the creation wizard, but each drop-down for each resource section can be expanded to edit or modify your resources from source to target. Finally, we must highlight the distinct types of mappings: failover mappings and test mappings. By default, both the failover mappings and test mappings are set to use the same settings. However, if you uncheck the box to "Use same mappings for failover and test mappings" you can then edit each mapping independently within this window. The benefit is that you could set separate production failover mappings compared and a separate environment for testing and development.
As you progress through the replication plan wizard and complete all the steps, replication will begin immediately. The steps that follow in the background include taking a backup and running a compliance check against your replication plan to ensure you can failover. Please reference a successful running replication plan below:
At any point, you can monitor all the jobs running inside BlueXP Disaster Recovery. The "Job monitoring" tab will allow you to see all working items for your environment. If you are noticing failures with specific replication jobs or replication plans are failing to meet compliance, the job monitoring window will provide you additional information to troubleshoot your potential issues. You can monitor all jobs regardless of status including successful jobs, jobs in progress or queued, jobs that were completed with warning, and failed jobs.
Now you have a working replication plan, and that plan is in compliance, you are now ready to perform a test failover. The test failover will mimic the scenario of an actual failover, and the best part is you can run your tests during the middle of the day without any disruption to your environment.
To perform a test failover, you need to be within the replication plans window. You then select the ellipses next to your replication plan and a set of options will appear.
NetApp has documented detailed steps to run the Test failover and they are very simple to complete. Prior to testing, you are prompted with a window to choose which snapshot (latest or previous snapshot) to failover to. To begin the failover you must type out "Test failover" as instructed.
The failover job will begin running and will prompt you to watch the job progress live, or you have the option to run the job in the background and use the job monitoring window to track the progress.
- BlueXP DR verifies the SnapMirror source and destination have a healthy relationship.
- Next, a backup is initiated to ensure that data integrity is up-to-date from source to destination.
- At this point, the VM is replicated with the virtual machine mappings chosen in the replication plan.
- The VM is then created in the target cluster with the same name as the source.
- The datastore is attached from the destination ONTAP environment based on a clone of the data protection (DP) volume. This enables the datastore to be read and written to inside the target vCenter cluster.
- Lastly, the VM inside the target cluster is powered on and made accessible via the virtual network it is mapped to.
Once this is completed your replication plan inside of BlueXP Disaster Recovery will show in a test failover state and still compliant. Remember this is a test, so the actual production VM is still running and writing data it's datastore on the source vCenter cluster, but due to cloning of the VM and the datastore volume your end user will never know you are performing a failover test.
Now that you have verified the failover test has been completed successfully, navigate back to the replication plans window. As you select the ellipses next to your replication plan, the option to "Clean up failover test" will be available. As the failover clean-up begins and runs in the background, the clones of the datastore and VM will be destroyed and your replication plan will switch back to healthy and remain compliant.
Please keep in mind that the test failover does not disrupt your production environment. However, it provides you verification that should a disaster happen, you will be able to successfully failover from primary to secondary vCenter clusters quickly and efficiently. If this were an actual disaster scenario and you had failed over, instead of clean-up options, you would perform a failback process for your replication plan. The detailed steps for a failback scenario are available via the product documentation.
Hopefully, this guide will provide you with enough detail to successfully use BlueXP Disaster Recovery once available to all NetApp BlueXP accounts. The prerequisite information above MUST be followed to ensure successful deployment and use of BlueXP Disaster Recovery. WWT was hand-selected as an early adopter of BlueXP Disaster Recovery and everything mentioned above has been tested and verified with the WWT Advanced Technology Center (ATC). Once the BlueXP Disaster Recovery product is available, we will transform our testing environment into a production lab for our WWT Platform users to experience BlueXP Disaster Recovery firsthand within our WWT-ATC BlueXP tenant.