What Is "Model-Driven" Orchestration — And Why Would I Use It?
In this blog
Duck season… wabbit season…
You might have to be my age to get that reference. There was an old Bugs Bunny cartoon where Bugs and Daffy were arguing over the current hunting season. Was it "duck season… or wabbit season"? I always think of that when I hear technical purists argue over which orchestration method is superior: model-driven or file-driven.
This is especially true when you start comparing tools like Cisco's NSO to other systems like Red Hat's Ansible. Two orchestration tools, both with broad commercial adoption. Advocates for each will claim that their tool "does it all" — or at least nearly so.
The upshot is that it's not a "duck season or wabbit season" question. There are model-driven tools in the market because they cater to certain applications. Likewise, trying to put the very square peg of model-driven orchestration into every round hole is a losing proposition.
So, what is "model-driven" orchestration? Why and when would I use it? Here is a primer to help you answer that question.
To begin, let's understand the non-model driven kind of orchestration, file-driven. If you're using Ansible or Terraform, you're used to building a runbook file. This runbook (or playbook, in the case of Ansible) is a text file containing all the commands, loops, pushes and pulls you wish the orchestrator to execute. You kick off the orchestrator (typically, from a Linux command line) and point it to the runbook file.
The orchestrator does its thing, it crunches through every command and gives you a readout of the success or failure of each step, and if it was able to complete the runbook.
After the play executes, the orchestrator does not keep a persistent record of the actions and configuration it generated. So there is no record of "this instance of this play, for these devices, with these input parameters." It's, essentially, a fire and forget method — which isn't a bad thing. My wife hands me a shopping list, I bumble about the grocery store and all that matters is that I get everything on the list.
This is just fine for many infrastructure automation use cases. Go provision this equipment with these base settings. Go spin up this server with these applications. Go build this firewall with these ACLs. You don't need a persistent record of the transaction because all you want is a finished product — config delivered, lights turn green, done and done… drink coffee.
So let's change the use case. Let's say you want to deploy an overlay VPN connection across a set of routers (of various vendors). The VPN is a service for a customer, so you'll need to provision service-specific ACLs. You'll also need to add the customer to certain databases at the time of turn-up.
Additionally, you'll want to deploy software monitoring probes to track the customer's service quality. Oh yeah, and you need to keep this whole service (and all of its associated configuration items) in a persistent record. Such that, if the customer adds a new site, it's part of the same service. And for good measure. If the customer cancels service, I want the orchestrator to back out everything it configured, automatically, without having my techs fish around for the service details.
That's a different proposition.
Could I do this with a file-driven orchestrator? Maybe if I write a bunch of code and add a database and integrate a workflow engine. But that's a lot of work and custom code to support.
A model-driven orchestrator was made for this kind of application, so how does it work?
To begin, a model-driven orchestrator maintains a synchronized copy of the config in each attached device. This is a big difference from file-driven systems. This off-line copy of the config is treated as the source of truth. Any time the orchestrator makes a change to a device, it applies a meta-marker to that change in its database — keeping track of every change for every service on every device, in the order in which it was made.
This enables the model-driven orchestrator to "roll back" a change to a specific version, or "dry-run" a possible change against the DB. It also means the orchestrator doesn't have to care about every config line in the device, just the ones it provisioned.
Instead of a list of commands, the orchestrator uses a service model. The service model is a collection of files which capture all the intelligence needed to configure the service. At run time, the service model directs the orchestrator to collect needed input, then it plugs that input into the model. The model generates the service config for each device in the service chain. It then, in a single transaction, pushes that config to all devices. If, for some reason, any one change fails, the entire transaction rolls back. This is key, because a model-driven orchestrator won't leave a service "half configured." The transaction either works, or it rolls back.
This ensures clean config management across many devices for many services. Additionally, when you delete the service, all service config in every affected device is removed. This helps the network operator manage resources, freeing previously occupied ports and channels.
Service models can be quite elaborate. You can include pushes and pulls from external data sources, live commands for status and even re-triggers ("kickers" in the case of NSO) which restart the service model based on inputs like traps. You can also include Python or Java code for specific programmatic actions. So building a service model can be more complex than building a runbook file.
But as you can see, the model-driven orchestrator can do things a file-driven orchestrator cannot. Likewise, there are certain automation tasks where you get the job done faster and easier with a file-driven orchestrator. So, as I said, it's not a "duck season, wabbit season" question — there are applications for both file-driven and model-driven orchestrators.
So, what makes an application a good candidate for model-driven orchestration? Here are some factors to look for:
- It's a service. If the application will be making connections or settings for a given user/customer within existing infrastructure.
- There's a lifecycle. The config for this customer, even though spread across multiple devices, must be kept as a single record, expanding, contracting or removing it over the service lifecycle.
- Inventory is involved. The service consumes inventory (ports, channels, IP address, route targets, etc.). You wish for the system to automatically consume or release this inventory in step with the service lifecycle.
- It's multi-vendor/multi-domain. The service may be more than just a connection across a network. The service may involve setting up probes, opening firewall ports, processing syslog messages, etc. — all done across various devices of like type or of different domains.
- There is a multi-tier system. Most file-driven orchestrators are single tier, they have a built-in person/machine interface (like a CLI or native GUI). Model-driven orchestrators are typically part of a larger solution. One system may process an order, one may control a fulfillment workflow and then the orchestrator deploys the service. The operator never sees the orchestrator, only its results. APIs carry all the system-to-system communications.
- MANO. Management and Network Orchestration (MANO) is the ETSI standard model for deploying Virtual Network Function (VNF) based services. VNF-based services rely that virtual infrastructure (v-routers, v-switches, v-firewalls) be deployed as part of the service. Model-driven orchestrators are very well-suited for this task.
Let's look at a couple of examples.
Here is one of the classic model-driven application examples, a L3VPN with Selectable Quality of Service (QoS). The orchestrator (Cisco NSO, shown here) has a service model which represents:
- The PE connections on the network.
- The CE connections on the network.
- The service endpoints (on the CEs).
- The QoS profiles across the network.
This is done across a multi-vendor plane (Cisco, Juniper and Nokia). The service can cater to any number of multi-point endpoints and any combination of QoS settings. At run time, the user enters data on a portal. The orchestrator's API collects the endpoint routers, ports and QoS setting for the service. The orchestrator then generates all the config necessary for the PE and CE routers in the service chain, regardless of the vendor type. It then pushes the entire service config to all devices in a single transaction.
The service is kept in the orchestrator's database and treated as a unified record. If the user changes the service (adds a new drop or changes the QoS setting), the orchestrator generates just the needed config and keeps the new settings as part of the service record. The orchestrator also provides general configuration management values. An example of this would be a change to the QoS profile.
Let's say the network operator upgrades settings in the Gold profile. That change can be applied across the network, including previously deployed instances. Likewise, when the user removes the service, all configuration for that service in every device is removed, freeing all the previously occupied ports.
Another application where a model-driven orchestrator is superior is in device migration. This is where a network operator is replacing one vendor with another — in this case Cisco firewalls are being replaced with Checkpoint. The technical problem is that, while the firewalls have the same functions, the configuration artifacts are very different between the two vendors. Additionally, this must happen in a production environment where a mistake can trigger a costly SLA.
The model-driven orchestrator (Cisco NSO, in this case) uses a technology called a Network Element Driver (NED). For NSO, the NED holds all the CLI translation to a given device. So, the orchestrator "syncs-from" the source device (Cisco) and then does a "sync-to" on the target device, translating the config from Cisco to Checkpoint. However it's not quite that simple, and a well-crafted service model must aid in the translation.
The rollback and lifecycle nature of the orchestrator also comes into play, giving the network operator a clean and quick way to put the customer back on the original firewall, should an error occur.
Model-driven orchestrators have found a particular niche in the NFV market, supplying the VNF-O (VNF orchestrator) portion of the ETSI MANO (Management and Network Orchestration) architecture. Again, the orchestrator is part of a multi-tier system, in this case with a VNFM (infrastructure manager) which controls the virtual machine lifecycle of the deployed software networking products.
The orchestrator provides the master control, signaling the VNFM (in this case, Cisco's ESC product) to spin up three VNFs (a router, a firewall and a load balancer). Once the virtual infrastructure is in place, the orchestrator is triggered to re-execute its service model and deploy an overlay service on top of the newly created virtual infrastructure.
The entire deployment is treated as a single service. So if the service is removed, all the virtual infrastructure is removed as well, freeing the consumed compute, storage and network.
To choose the right orchestration method, you must understand each method's pros and cons. Model-driven systems cater to complex applications and scale to carrier grade networks (pro). But they also have a cost and may require a third party to deploy (possible con).
File-driven orchestrators are typically open source, and you can get started with them very quickly (pro). But they may not have an API, which makes them a poor choice for a multi-tier system (con). Likewise, they may not scale to the desired network (another con). Below is a quick-hit chart to help you compare the two kinds of systems.
CapEx: Zero to medium cost
OpEx: Zero to medium cost
CapEx: Medium to high
OpEx: Medium to high
Skillset: Easily attainable for current SMEs
Maintenance: Public community, self-support
Skillset: Attainable / SME or dev
Maintenance: Vendor, community
|Examples: Ansible, Terraform, Cloudify, HEAT templates||Examples: Cisco NSO, Ciena Blue Planet|
|Upside: Very accessible, low cost of entry, ever-expanding functionality, dev-ops friendly, multi-vendor device libraries, quick start.||Upside: Lifecycle management, service friendly, carrier class, great as part of a larger solution, multi-vendor device libraries.|
Downside: No lifecycle capability (fire and forget). Enterprise grade, but not carrier grade. Libraries subject to community support. Limited or no API.
|Downside: Can be complex and require a commercial deployment. Not good for some use cases. Models take a long time to develop.|