In this white paper

Introduction

This white paper outlines the steps taken to fully automate the deployment of vCMP guests, set up a baseline configuration, capture brownfield configurations and replay those configurations to the new vCMP guests using Ansible Automation. We'll also share insights into our thought process relative to the design of our Ansible playbooks and roles. Logistically, we'll also share our approach towards minimizing disruption while cutting over to the replacement guests and identifying the configuration drift of your brownfield deployment while the upgraded devices sit idle waiting to be activated.

Preface

The time had come to consider an upgrade of our BigIP 12.x vCMP guests. The version was aging, and newer versions contained much-needed bug fixes and security patches that would benefit our environment. Aside from these obvious enhancements, this upgrade presented an opportunity to impose some standardization in various configuration aspects of BigIP. Without automation, our systems were plagued with inconsistencies in naming conventions across redundant pairs, and between environments. Things like iRules, SSL Client profiles, virtual servers, pools, and other objects were sometimes named slightly differently and had mismatched configurations. These discrepancies made building automation on top of this brownfield a difficult task. As part of the upgrade, we wanted to correct these inconsistencies, enforce a naming standard, and build an easily consumable automated means to impose these corrections and newly devised standards. We also wanted a repeatable and automated way to stand up new vCMP guests and lay down a baseline configuration inclusive of establishing device connectivity for a standard HA pair, setup device trust, device groups, traffic groups, self and floating IPs, routes, various system preferences, remote logging, authentication, and specific system db variables. This automation would remain in place to maintain these configurations across the fleet of guests as we add new networks, routes, or deprecate backend services fronted by F5. Read on to see how these challenges were tackled and overcome not only to perform an out-of-place upgrade but to continue to maintain the desired state.

Methods and materials

We began by exploring our in-place upgrade options. F5 provides documentation on performing in-place upgrades, and while thorough, it didn't meet all of our upgrade requirements. This method would have meant a step-upgrade process as BigIP cannot be upgraded directly from v12.x to v15.x. While downtime risk is low via this procedure, we felt it offered less flexibility in reverting state as fast as possible and could leave us in a state of a broken HA pair and working under the time constraint of a maintenance window to resolve a potential upgrade failure. Besides these limitations, we'd still be left with the inconsistencies mentioned above. This tech debt would simply follow us post-upgrade, and the upgrade itself brought little value outside of any bug and security fixes in easing the configuration management of our F5 devices. We decided to follow an out-of-place upgrade approach to build new vCMP guests in parallel, pre-load all the configurations in an automated fashion, and perform a cut-over from legacy to the new environment. This approach gave us a ton of flexibility to address our tech debt of inconsistencies, capture our brownfield configuration state in code, apply them to our greenfield, and evaluate and adjust on the fly as needed. To achieve this, we chose Ansible as the automation tool for this upgrade. We used the existing F5 Ansible Modules and also created custom modules to fill any gaps.

Getting started

Extracting information from legacy systems

This is the most important step in the upgrade procedure. In order to meet one of our core objectives in standardizing the naming of objects and ensuring the integrity of the set values for each attribute within those objects are applied to the greenfield devices precisely the same as the brownfield devices, we had to programmatically retrieve that data. At the time, tools such as f5-journeys were not publically available. Instead, we wrote our own means of gathering the configuration state of our devices using the f5 bigip_device_info Ansible module. This module can collect all or a restricted subset of information from F5 BigIP devices. Data is returned in JSON, giving us the flexibility to query our legacy configurations, review them, and manipulate it on the fly to meet our renaming and integrity-of-data objectives. The massaged legacy data could then be replayed onto the greenfield devices, ensuring an exact, yet consistent copy is migrated correctly.

For our upgrade project, we needed to gather a json representation for the configuration of:

  • Virtual Servers
  • Pools
  • Monitors
  • Certificates
  • iRules
  • Profiles
    • fastl4 profiles
    • client ssl profiles
    • server ssl profiles
    • http profiles
    • tcp profiles
    • APM access profiles

Now that we know how to collect our brownfield configurations, we needed to think about how to store the collection of data. Since the returned format is JSON, we contemplated populating the data into a document store like MongoDB, but that would have required us to write more code to integrate with a data store, and we weren't yet in a space to consider that as a short term, viable option. Long term, this approach could offer a ton of flexibility in being able to capture configurations on a daily schedule, report the diff for individual objects, and allow us to revert to a single point of configuration without having to restore from the last UCS backup, or sift through audit logs to determine what changed and manually remediate. We put this idea on the back burner as a future enhancement and decided that dumping the JSON to our local file system allowed us to move forward quickly in developing our automated migration idea. Using the vi editor, jq, or vscode to look at the results suited our needs.

Let's get into some code. In the example below, we're gathering info on iRules. bigip_device_info will return an array of dictionaries containing key-value pairs respective to each iRule. We planned to grab the name of each iRule, dump the iRule code into a file named after the iRule, and store it in a directory named after the device from which the data was pulled. As part of our naming standard update, we settled on dash-separated iRule names instead of a mix of underscores and dashes used in our legacy environment.

- name: Collect BIG-IP facts
  bigip_device_info:
    gather_subset:
      - irules
    provider: "{{ provider }}"
  delegate_to: localhost
  register: device_facts

- name: Set fact
  set_fact:
    file_name: "{{ device_facts | json_query('irules[*].[name]') | lower |
    regex_replace('_', '-') }}"
    file_content: "{{ device_facts | json_query('irules[*].[definition]') }}"

- name: Create path if it does not exist
  file:
    path: "{{ path }}/{{ inventory_hostname }}"
    state: directory
  delegate_to: localhost

- name: Write content to local file
  copy:
    content: "{{ item.1 }}"
    dest: "{{ path }}/{{ inventory_hostname }}/{{ item.0 }}.tcl"
  with_together:
    - "{{ file_name }}"
    - "{{ file_content }}"
  when: not item.0 is match('-sys')
  delegate_to: localhost

Note: Depending on how your Ansible inventory is set up, you may want to use the run_once keyword to target a single host from the group to query and download the configurations. If the peers are in sync, there's no reason to pull data from each since they contain the same configurations which is typical when using Ansible to automate against f5. We're not using that keyword here provided the structure of our Ansible inventory only includes a single host per group.

As you notice in the set_fact task, we are standardizing the filename to lowercase and replacing all _ with -. We are also excluding iRules with name _sys since these out-of-the-box iRules are inclusive of a default installation of BigIP. Dumping these configurations into a directory named after the devices from which they're pulled allowed us to run a diff across redundant pairs to ensure iRules matched across data centers. In cases where inconsistencies were discovered, we would simply decide which iRule is master, and update the appropriate JSON. Using a tool like Beyond Compare made this an effortless thing to do.

With a working prototype, we took a step back to think through how to structure our code to collect the remaining pieces of data we needed from the brownfield installation. We decided it was best to write a separate Ansible playbook to collect the data and dump the configs instead of making it inclusive of our yet-to-be-developed provisioning code. This way, our f5-query module can be used to gather and query any part of our f5 configuration on demand, and that has proven to be quite useful.

Figure 1 below illustrates the code repository layout with separate roles created to target configs for specific BigIP objects.

.
├── gather_configs.yaml
├── vips.yaml
├── pools.yaml
├── irules.yaml
├── monitors.yaml
├── nodes.yaml
├── profiles.yaml
├── selfip.yaml
├── vips.yaml
├── vlans.yaml
├── group_vars
│   └── all
│       └── main.yaml
├── inventories
│   ├── all
│   │   └── hosts
│   ├── dev
│   │   └── hosts
│   ├── prd
│   │   └── hosts
│   ├── snd
│   │   └── hosts
│   └── tst
│       └── hosts
└── roles
    ├── irules
    │   └── tasks
    │       └── main.yaml
    ├── monitors
    │   └── tasks
    │       ├── main.yaml
    │       └── monitors.yaml
    ├── nodes
    │   └── tasks
    │       └── main.yaml
    ├── pools
    │   └── tasks
    │       ├── get_pools.yaml
    │       └── main.yaml
    ├── profiles
    │   └── tasks
    │       ├── apm_access.yaml
    │       ├── client_ssl.yaml
    │       ├── fastl4.yaml
    │       ├── http.yaml
    │       ├── main.yaml
    │       ├── server_ssl.yaml
    │       └── tcp.yaml
    ├── selfip
    │   └── tasks
    │       └── main.yaml
    ├── vips
    │   └── tasks
    │       ├── get_virtual_addresses.yaml
    │       ├── get_virtual_servers.yaml
    │       └── main.yaml
    └── vlans
        └── tasks
            └── main.yaml
Figure 1. Ansible Role layout to extract configurations from legacy BigIP devices

Upgrade procedure

With a method in hand to collect our legacy configurations, we can now focus on migrating those configurations over to brand new guests. We wanted to simplify the steps involved as much as possible, and set out to target 3 steps:

  1. Deploy new vCMP guests
  2. Populate new vCMP guests from data extraction
  3. Perform a cut-over

Deploy new vCMP guests

Standing up a new guest was as simple as using the f5 Ansible module bigip_vcmp_guest

---
- name: Deploy VCMP Guest
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Create vCMP guest
      bigip_vcmp_guest:
        name: my_new_bigip_vcmp_guest
        initial_image: BIGIP-15.x.x.x.iso
        allowed_slots:
          - "1"
          - "2"
          - "3"
        cores_per_slot: 4
        vlans:
          - 123
          - 456
          - 789
        mgmt_network: bridged
        mgmt_address: 10.10.10.10/24
        mgmt_route: 10.10.10.1
        state: present
        provider:
          server: chassis1.wwt.com
          user: admin
          password: secret

While it's possible to automate the upload of an ISO to the Viprion Chassis, we ran into a known bug and ended up manually uploading the ISO as a prestage requirement. Fortunately, this was a one-time step that didn't impede our full end-to-end automation, and we were able to continue testing with the above playbook and successfully deploy guests via Ansible using the above test!

With a working prototype, we identified the following pieces of critical information to consider:

  1. ISO Image to install on guest
  2. Management IP and route
  3. VLANs to assign to the guest
  4. Blade assignment(s)

These items could differ across guests, so we started to frame the structure of our Ansible playbook:

.
├── group_vars
│ ├── all
│ └── ltm_int_dc1
├── inventories
│ └── dev
│     ├── group_vars
│     │ └── ltm_int_dc1
│     └── hosts
├── roles
│ └── vcmp_guest
│     └── tasks
│         ├── deploy_guest.yaml
│         └── main.yaml
│         
├── main.yaml
└── vcmp_create_guest.yaml

Values for the parameters accepted by this module can be fed into Ansible in various ways, whether setting common defaults in group_vars or in inventory, which deals with environment-specific settings.

For example, group_vars/all may contain data center specific management network values for all hosts:

---
datacenters:
  DC1:
    vcmp_management_network: 10.10.10.0/24
    vcmp_management_gateway: 10.10.10.1
    vcmp_ha_network: 10.2.3.0/24
  DC2:
    vcmp_management_network: 10.222.2.0/24
    vcmp_management_gateway: 10.222.2.1
    vcmp_ha_network: 10.220.3.0/24

and group_vars/ltm_int_dc1 may contain group specific values that are global for the ltm_int_dc1 hosts:

---
datacenter: DC1
ha_vlan: ha_10_2_3_vlan

and inventory variables inventories/dev/group_vars/ltm_int_dc1 may contain values that are specific to that specific set of hosts within that inventory:

---
vlans:
  - ha_10_2_3_vlan
  - int_10-2-1_vlan
  - int_10-2-2_vlan
  - int_10-6_2_vlan
  - int_10-6-4_vlan
cores_per_slot: 4

In the spirit of full end-to-end automation, we did not want to manually assign static management IPs and hard code those values as variables. In order to set a management IP at build time, we used Ansible to integrate with our IP Address Management solution to find the next available IP in the management subnet, setup a DNS entry for that IP, and feed the returned IP into guest_mgmt_ip. With a playbook structure in place that separates our data from our code, we refactored our initial prototype and created a playbook vcmp_create_guest.yaml:

- hosts: ltm_int_dc1
  connection: local
  gather_facts: false
  roles:
    - role: vcmp_guest
      image: BIGIP-15.x.x.x.iso

Each role will use its main to import tasks within that role so we can create more meaningful task file names to simplify the readability of our code.  For example:

roles/vcmp_guest/tasks/main.yaml

---
- name: Deploy VCMP Guests
  import_tasks: deploy_guest.yaml

roles/vcmp_guest/tasks/deploy_guest.yaml

- name: Create vCMP guest
  bigip_vcmp_guest:
    name: "{{ inventory_hostname | regex_replace('.wwt.com') }}"
    initial_image: "{{ image }}"
    allowed_slots:
      - "1"
      - "2"
      - "3"
    cores_per_slot: "{{ cores_per_slot }}"
    vlans: "{{ vlans }}"
    mgmt_network: bridged
    mgmt_address: "{{ guest_mgmt_ip }}/24"
    mgmt_route: "{{ datacenters[datacenter]['vcmp_management_gateway'] }}"
    state: present
    provider: "{{ vcmp_host_provider }}"

A couple of things to note here is that the provider is set to the credentials for the chassis to which the guest is to be deployed. That may go without saying, but can easily be overlooked considering these Ansible modules are integrating with the REST API endpoints exposed by f5, and the code is typically executed from the Ansible host, and not on the endpoint. You will need to get creative with the data structure of your variable declarations to align guests to chassis, VLANs to guests, etc all in the name of simplifying your Ansible code without relying on coded conditional statements to control behavior. Hopefully the above provides a good framework for doing so.

From here, we iterated on our code, destroying our guests multiple times and automatically rebuilding them to test the integrity of our code. Throughout the process, unexpected behaviors arose. One example is that we found it was better not to include the domain name in the name of the guest because it does not update the DSC Device Name, which in turn causes problems with the Identity Certificate retaining the default bigip1 hostname. More info on this issue can be found on the f5-ansible GitHub page. Throughout the project, we found other issues and were pleased to find f5 accepting the issues and resolving them rather quickly. Refer to the f5-ansible GitHub issues to see a complete list of the things we identified.

With guests deploying successfully to the desired chassis, and a playbook framework in place, it was time to consider applying baseline configurations for each device, including:

  1. Reset admin password
  2. Reset root password
  3. Setup TimeZone
  4. Setup NTP
  5. Generate selfip for HA VLAN
  6. Setup Network
    1. DNS
    2. Create selfip for HA VLAN
  7. Disable Setup Utility

An example Ansible playbook to create a vCMP guest may end up looking like this:

- hosts: "{{ hosts_list }}"
  connection: local
  gather_facts: false
  roles:
    - role: vcmp_guest
      image: BIGIP-15.x.x.x.iso
      reset_admin_password: "{{ reset_admin }}"
      reset_root_password: "{{ reset_root }}"

Resetting the admin and root user passwords are boolean as the password reset should only occur on provision when the default credentials are known. At the same time, the same role can be used to rotate these passwords, so the boolean flag serves dual purpose beyond initial provisioning.

We decided that the main.yaml task for the vcmp_guest role includes other tasks within that role to fulfill the standard setup of the guest:

---
- name: Deploy VCMP Guests
  import_tasks: deploy_guest.yaml

- name: Setup System ›› Users ›› Update admin credentials
  import_tasks: reset_admin_password.yaml

- name: Setup System ›› Users ›› Update root credentials
  import_tasks: reset_root_password.yaml

- name: Setup System ›› timezone
  import_tasks: timezone.yaml

- name: Setup System ›› ntp
  import_tasks: ntp.yaml

- name: Network ›› Self IPs
  import_tasks: generate_ha_selfip.yaml

- name: "Setup Utility  ››  Network"
  import_tasks: network_setup.yaml

- name: Disable the setup utility
  bigip_sys_global:
    gui_setup: no
    provider: "{{ vcmp_guest_provider }}"
  delegate_to: localhost

The initial Ansible module layout started to take shape:

.
├── group_vars
│ ├── all
│ ├── chassis
│ ├── ltm_int_dc1
│ └── ltm_int_dc2
├── inventories
│ ├── chassis
│ │ └── hosts
│ └── dev
│     ├── group_vars
│     │ ├── all
│     │ ├── ltm_int_dc1
│     │ └── ltm_int_dc2
│     └── hosts
├── roles
│ ├── vcmp_guest
│     ├── defaults
│     │ └── main.yml
│     └── tasks
│         ├── deploy_guest.yaml
│         ├── generate_ha_selfip.yaml
│         ├── main.yaml
│         ├── network_setup.yaml
│         ├── ntp.yaml
│         ├── reset_admin_password.yaml
│         ├── reset_root_password.yaml
│         └── timezone.yaml
├── main.yaml
└── vcmp_create_guest.yaml

Where main.yaml playbook is:

---
- name: Create VCMP Guests
  import_playbook: vcmp_create_guest.yaml

From here, you may want to set up device trust for your guests. We created a device_management role where the main.yaml task includes other tasks within that role to fulfill the setup of HA, device trust, etc:

---
- name: Device Management  ››  Devices  ›› Change Device Name
  import_tasks: set_hostname.yaml

- name: Device Management ›› Configure HA
  import_tasks: configure_ha.yaml

- name: Device Management ›› Device Trust
  import_tasks: device_trust.yaml

- name: Device Management ›› Device Groups
  import_tasks: device_groups.yaml

- name: Device Management ›› Traffic Groups
  import_tasks: traffic_groups.yaml

Populate new vCMP guests from data extraction

Alright!  We have new guests deployed running BigIP 15.x, with a complete baseline configuration. We felt good where our guest deployment code was, committed to master, and created a new branch to begin coding the creation of VIPs, pools, iRules, profiles, etc from our stored JSON. Here is a code snippet for creating iRules by reading the .tcl files from the legacy systems obtained via our f5-query module.

- name: Set fact for iRules directory
  set_fact:
    irule_files: "{{ '/tmp/f5-irules/' + inventory_hostname + '/*.tcl' }}"
  run_once: true

- name: Add the iRule contained in template irule.tcl to the LTM module
  bigip_irule:
    module: ltm
    name: "{{ item.split('/')[-1] | regex_replace('.tcl', '') }}"
    src: "{{ item }}"
    state: present
    provider: "{{ vcmp_guest_provider }}"
  loop: "{{ q('fileglob', irule_files | lower) }}"
  delegate_to: localhost
  notify:
    - Save the running configuration to disk
    - Sync configuration from device to group
  run_once: true

A more advanced case is for creating a virtual-server object where we read each JSON file and convert it to a variable within the Ansible task to use the values from the file. The basic code looks like this:

- name: Create bigip_virtual_server on new guest from legacy config
  bigip_virtual_server:
    state: present # corresponds to "{{ attr.enabled }}"
    name: "{{ attr.name | lower | regex_replace('_', '-') }}"
    description: "{{ attr.description | default(omit) }}"
    type: "{{ attr.type }}"
    source: "{{ attr.source_address }}"
    destination: "{{ attr.destination_address }}"
    port: "{{ attr.destination_port }}"
    ip_protocol: "{{ attr.protocol }}"
    profiles: "{{ _profiles }}"
    default_persistence_profile: "{{ attr.persistence_profile |
       default(omit) }}"
    irules: "{{ attr.irules | default(omit) }}"
    address_translation: "{{ attr.translate_address | bool }}"
    pool: "{{ '' if attr.default_pool is not defined else
      attr.default_pool | lower | regex_replace('_','-') |
      regex_replace('common', 'Common') }}"
    snat: "{{ attr.snat_type }}"
    provider: "{{ vcmp_guest_provider }}"
    vars:
        attr: "{{ lookup('file', vip) | from_json }}"
  delegate_to: localhost
  notify:
    - Save the running configuration to disk
    - Sync configuration from device to group

What is going on here?

  • We use the lookup plugin to access data from a file in conjunction with the from_json plugin to load in some already formatted data (line 22).
  • The data is loaded into a variable attr, short for "attribute", such that we can pick off individual values from each key/value pair in the consumed json file.
  • Each key/value pair from the json is then accessed and assigned to each parameter in lines 4 through 19, effectively replaying the legacy VIP onto the new device, and thus recreating that object on its new home and new standardized names (dash separated).
  • The default filter is also used to omit that particular parameter in case it was not configured across all objects. For example, some VIPs have a description, while others do not.
  • profiles returned from bigip_device_info are not in a format accepted by the bigip_virtual_server module so you may have to convert/massage certain returned values on the fly.

Note: When replaying legacy configurations onto the new devices, you may not be able to immediately capture the state of that object. For example, if a VIP is marked as disabled on the legacy side, and does not yet exist on the new device, the state of that object must be set to "present", or "enabled" (IE the object must be present before it can be modified) In order to reset the state of that object on the new side, a second task identical to the one above needs to be added, except state would become:

state: "{{ 'enabled' if attr.enabled == 'yes' else 'disabled'}}"

As the module continues to grow, the structure becomes really important. We decided it was best to match our Ansible directory structure with that of the F5 management UI. This made it very easy to track which pieces of code were responsible for each configurational aspect of the f5.

Also, to help with execution control of our playbooks, we added tags for each category like VIPs, iRules, and monitors to make it easy to use. You can think of this as running the playbook in the various configuration phases from start to finish.

A full end-to-end configuration of guests to establish device trust, traffic groups, self and float IP address creation, etc may be structured as shown in this visual diagram:

The above is the overall basis and approach taken in order to retrieve legacy f5 configurations and replay them onto the new devices. This approach eliminates the propensity for human error in a stare and compare fashion, and brings confidence in capturing each setting verbatim and migrating it. In addition, this method leaves your working, legacy devices in a pristine condition, leaving the door open to reintroduce these devices should an initial cutover attempt not go according to plan.

To that end, our new devices were connected to the network until a certain point. When it came time to configure the VIPs, we removed the assignment of any active VLANs (except MGMT/HA VLAN) on the new guests to avoid conflict. We also flipped the Virtual Server Addresses ARP status to False. This allowed us to load all the configurations on the upgraded systems ahead of time, thus saving a lot of time having to replay those configurations onto the new devices during the cut-over.

Perform a cut-over

At this stage, our new devices are fully configured and have an exact replica of the legacy configurations. To ensure no drift between the two environments, we imposed a change freeze and removed remote login capability for users, forcing them to work with us to introduce any updates. That way, we can ensure we're reflecting any changes in the days leading up to the cut-over.

Another advantage of this approach is cutting over to the new devices is very fast. Little to no preparation work for the new devices has to be performed in a maintenance window. Instead, engineers can focus on the task at hand, and onboard the newly upgraded devices. Backing out the change is just as fast, and could even allow engineers to resolve any overlooked issues during the window.

Cutting over to the new devices was as simple as three steps, again using Ansible:

  1. Remove all VLANs from legacy F5 vCMP guests (except HA).
  2. Add all required VLANs to the new systems
  3. Enable ARP for the Virtual Server Addresses (triggers a GARP to inform the network of the new mac addresses and IP addresses of the new f5 devices)
  4. Validate

We hope our devised method to perform an upgrade of F5 vCMP guests provides some insights on tackling this for your organization in a way that minimizes disruption as much as possible, and offers a foolproof way to maintain the integrity of your legacy environment as a backout strategy and low risk to business continuity.

The above principles can also be applied to LCM of F5 DNS. For this, we had to write a couple of custom Ansible modules to retrieve configurations currently not offered by bigip_device_info. Also, to achieve full end-to-end automated configuration of F5 DNS, we wrote Ansible modules to leverage the API to run gtm_add for setup of a synchronization group, and bigip_add to enable iQuery communication between BigIP DNS sync group members and remote BigIP LTM systems.

We hope this article has provided a viable approach to perform LCM of your f5 infrastructure.

Technologies