I was fortunate to work on the EFT of one of the best integrated day 2 operations suites, providing unparalleled visibility into a data center network. Look for this type of visibility across the multiple domains of WAN, LAN, DC and cloud as Cisco rolls this fabric analytic and flow engines to manage and report on misconfigurations, bugs and traffic drops from a single platform: the NEXUS Dashboard.
An introduction to the new Cisco NEXUS Dashboard and day 2 operations suites
Cisco has embarked on a unification journey to combine Network Insight Resources (NIR) that did packet flow analysis and Network Insights Advisor (NIA) the performed bug analysis, PSIRT, EOL/EOS, engineering notices all pulled daily from Cisco and compare to the fabric daily. These functions have been combined in the new 5.1 release of NEXUS Insights(yes they changed the name). In the next version 6.0 of NEXUS Insights will combine NAE all under one platform.
Previously, the day 2 operations suite of NAE, NIR and NIA ran separately on computing as a .ova in vSphere or on the APIC as an application. The architecture never allowed for sharing data between the apps or correlations with errors and telemetry views of packet loss. With the roadmap moving, we had applications and a shared data lake for all applications to draw from and correlate between application errors, changes to the policy and deep flow telemetry, all visual as an epoch.
For all the applications to run and have sharable databases, the NEXUS Dashboard (ND) was created. The ND platform allows all the Cisco day 2 apps and third party applications to run on a single appliance. Secondly, the ND has to be an expandable CPU and storage-intensive platform; today, the platform can scale with 3 master nodes and 4 worker nodes with the apps and their data residing on the ND cluster. As ND matures, more ND servers can join the cluster, and they can be separated regionally if within TTL requirements to distribute applications and provide DR strategies.
Another critical point to make on the NEXUS dashboard is that it is the hub for multidomain policy and telemetry using the Kafka bus. The NEUXS Dashboard is able to provide a giant data lake for ACI, DCNM or NX-OS policy and telemetry. By using the multidomain connector and the Kafka bus, it adds the policy and telemetry from DNAC, SD-WAN and public cloud workloads as well as third party integrations such as AppDynamics, Splunk and ServiceNow. Eventually providing an end-to-end view of policy and telemetry between multiple domains and third-party applications.
The NEXUS Dashboard offers RBAC controls and offers an admin view where sites are onboarded and configured and day 2 applications are added and configured. An operator view where the operator can go into the day 2 applications and monitor the network using the admin's sites and applications. It is a standard dashboard for day 2 ops that's easy to use, scale and maintain.
The NEXUS Dashboard has a virtual version for small fabrics or lab and demo capabilities, as well as a cloud version for public cloud visibility. The licensing model to use the NAE and NEXUS Insights does require an upgrade to premier licensing.
The NEXUS Dashboard allows us to host applications and consume and export data to form one large correlated data lake. The advantage of correlation is that we can now see business errors and outages, such as a shopping cart not working in AppDynamics, and we can then ingest these events into the NEXUS Dashboard. With the correlated database, we can use NAE and NEXUS insights to correlate whether a change or the app's network is experiencing an error. Using the NEXUS Dashboard tools, we can get real-time suggestions on fixing the errors using NAE and NI to reduce MTTR of business apps drastically.
The NEXUS Dashboard platform is the hardware-based cluster form factor consisting of a minimum of 3 nodes expandable to 7 nodes. There is also a software-based NEXUS Dashboard that can run as a .ova on ESXi hosts. In a future version, a cloud-based NEXUS dashboard is available.
Both form factors are deployed by connecting the inband management network to the nodes. The first mode is by using the EPG/BD mode and directly connecting the NEXUS Dashboard to the fabric to gather telemetry from the APICs and flows from the switches.
There is also L3 out connectivity by connecting the NEXUS Dashboard’s data interfaces via an L3 out connected to inband management.
Once the NEXUS Dashboard is configured, day 2 applications such as MSO, NAE and NI are added to the NEXUS Dashboard (ND). We shall take a brief look at the applications that can run on the ND.
The first application to discuss is the Multi-Site Orchestrator or MSO. The MSO is used to create VXLAN connectivity policies between ACI on-prem and cloud sites, as well as tenant templates that can extend L2 and L3 connectivity seamlessly with a single policy between sites and public cloud.
Next is Network Assurance Engine, or NAE. To understand the role of NAE in ACI day 2 operations, we must look at what intent-based networking is. Let’s look at the standard intent-based networking model. We combine business intent (I need users moving to new finance building 5 to have access to their resources) with IT intent (users in building 5 can only access the internet and resources in the Finance Tenant). A policy is created to fulfill this intent and applied via automation to the infrastructure. What has been missing is an assurance engine to verify the policy applied will give you the correct intent. This is where NAE comes in.
NAE works by collecting data from the APIC then comparing it to the intent, policy and state of the fabric. It uses mathematical modeling and validates code to configurations to validate the policy is correct.
NAE can provide endpoint connectivity analysis using policy explorer and natural language search.
NAE has a Epoch timeline anaysis to show when a error occured and what change was made.
NAE can also be used for pre-change validation of ACI policy and security compliance.
Finally, Network Insights 5.1 (NI 5.1) offers many data ingestions sources of telemetry such as Syslog, RIB and FIB tables, and streaming telemetry. It then ingests these datasets and extracts the metadata, and correlates against a database updated from Cisco. This telemetry and correlation of the metadata NI will derive insights and suggest remediation actions for finding root cause analysis and predictive failure.
Some of the use cases for NI are MTTR, OPEX savings, availability and uptime, preventative measures, bug notices of the software and hardware versions, and PSIRT notices. These are all derived from a daily updated database from Cisco. For high security or air-gapped networks, proxies and other methods are available to provide an updated database.
In the future, ND 3.1 will offer a third form factor providing the NEXUS Dashboard in the public cloud. Also, in Version 6.0, all of the day 2 operations (NEA and NI) are combined into one application allowing a single pane of glass view never before available in any OEM's fabric.
Make sure to follow our Data Center Networking topic to be informed about upcoming labs and demos of the NEXUS Dashboard and day 2 operations in action.