Is Hop-by-Hop the Future of Optical Transport Networking?
In this article
There have been numerous attempts to converge layer 1 (optical transport) with layer 3 (routing), but the concept was not widely adopted. It was difficult for network operators to cost-justify, due to large DWDM modules taking up valuable router chassis slots, high power consumption and cooling requirements.
Over the past ten years, there have been tremendous advancements in DWDM optics and routing technologies. We have gone from large, power-hungry 40G and 100G line card-based modules to today's very power efficient, small form factor 400G-ZR/ZR+ pluggables. By pairing 400G-ZR+ optics with high density, low power routing platforms, have the optical and routing stars finally aligned? We believe they have, but more importantly, companies like Cisco have no doubt.
Moving the DWDM transponder into the router makes sense, as discussed in our previous article 400G-ZR & ZR+: The Latest in Pluggable Coherent DWDM. That said, some vendors are considering taking it a step further by simplifying the DWDM line system and leveraging routing technology to replace its functionality.
It's this shift in functionality from layer 1 to layer 3 that this article will focus on. Since we will be taking a more in-depth look into routing technology, I teamed up with my CCIE, industry veteran, routing mentor and colleague, Mike DiVincenzo, to write this article.
A DWDM network gives us the ability to carry multiple, bi-directional, high-speed data paths or wavelengths between locations using one pair of fiber. Generally speaking, 96 individual wavelengths are available in a DWDM system, all of which multiplexed onto one fiber optic cable.
Think of each unique wavelength as an individual circuit. There are two primary components of a DWDM network, the transponder and the line system. With 400G-ZR optics acting as our transponder, we will focus on the line system.
The most basic DWDM line system is a "point to point" configuration (see figure 1). Point to point networks utilize a multiplexer and a demultiplexer at each site. The multiplexer takes in (adds) the individual transponders wavelengths, combines them and transmits them to the other site on one fiber. On a second fiber, the demultiplexer receives wavelengths from the other site and separates (drops) them back out.
Working together, this provides (up to) 96 bidirectional, individual connections or circuits between the two sites using only two fibers. That's the equivalent of 192 individual fibers. This application is widely used for data center to data center connectivity.
For more complex networks with multiple sites and varying levels of traffic demand, point to point DWDM line systems are typically not practical. In these environments, we deploy a more sophisticated multiplexer/demultiplexer known as ROADM, or reconfigurable add-drop multiplexer. ROADM nodes add and drop wavelengths and allow them to optically pass through a site in a multi-site topology.
Optical pass-through enables wavelengths with one or more sites between its source and destination to traverse those intermediate sites untouched. Without it, we would have to demux then re-mux (known as regeneration) each wavelength that needs to pass through. To regenerate one wavelength at one site requires two additional transponders, significantly increasing the system's cost.
In the example below, Site 1 needs to transport data to Site 2 and Site 3, but Site 3 requires two times more bandwidth. As you can see, we have three wavelengths (circuits) represented by pink, green and purple dashed lines.
In both cases, the purple circuit provides connectivity directly between Site 1 and 2. In the non-ROADM case, the pink and green circuits provide the required capacity between Site 1 and Site 3, but to reach site 3, they need to be regenerated at Site 2. In the ROADM case, the green and pink wavelengths are simply optically passed through.
Utilizing ROADM technology, the line system also provides path protection for ring or mesh configurations. A DWDM optical path switch is guaranteed <50ms. Therefore the routers are virtually unaffected by a path failure and switchover. Figure 3 illustrates a basic ring topology.
There are 2 DWDM wavelengths provisioned: green and pink. Green is the primary path, and pink is the protect path. As you can see, if there is a failure on the green path, within 50ms, the pink path takes over.
ROADM line systems have been available for almost 20 years and are deployed extensively in large and small networks throughout the world. The challenge is that ROADM devices are exceedingly expensive for vendors to develop and produce, with little to no investment return.
On the other hand, the transponder is substantially more profitable, and you need exponentially more transponders than ROADM devices to build a DWDM transport network. It's not uncommon for a vendor to lose money on the sale of the ROADM line system in hopes to recoup and generate profit from selling the transponders.
That said, nothing is preventing network operators from deploying vendor A's line system and deploying vendor B's transponders on that line system, further jeopardizing a vendor's ability to recoup and make a profit from the sale of a ROADM DWDM network. To be fair, it's not only the vendor's bottom line driving a shift from ROADM architectures. Network operators have been driving towards this for a long time. By converging optical and routing, it simplifies their network operations and reduces overall equipment and operational costs.
So, what's a vendor to do? For Cisco, at least, they're exploring the idea of phasing out ROADM architectures by levering high capacity ZR+ DWDM links between sites and utilizing Segment Routing for the control plane along with MPLS or IPv6 for the data plane. These two technologies are often referred to as SR-MPLS and SRv6, respectively.
Where traffic would optically pass through a site, now it's switched through a site. Rather than optical path protection, SR TI-LFA will be the path redundancy mechanism, and all circuits, or services, will be managed as segment routed paths. Let's take a look at this paradigm shift and the routing technology behind it.
Hop-by-hop architecture is the concept of removing the ROADM layer from the network. The relatively simple point-to-point DWDM line system mentioned above becomes the layer 1 architecture. Point-to-point DWDM systems are placed in-between each site, providing direct connections for adjacent routers.
Making every site a layer 1/3 termination site, now every packet received will be processed by the router. In other words, at every site, every wavelength is regenerated, as illustrated in Figure 2a above. In this case the regeneration process occurs in the router, and we can now route every packet at every site.
The theory is that with high capacity ZR+ optics combined with relatively inexpensive, port dense core routers powered by Segment Routing technology, we can manage all traffic paths within the router — including pass-through traffic, where packets will simply come into the router and go right back out towards their destination site.
Figure 4 illustrates a 4-site ring configuration; each site is connected to its adjacent sites with 2x 400G DWDM wavelengths. The pink and green dashed lines represent the two wavelengths, and each wavelength consumes one port in the router. This example provides 800Gb of bandwidth between any two routers in the network.
Considering point-to-point line systems can support up to 64x 400G wavelengths between sites, combined with port density like Cisco's 8201 fixed form factor router that supports 24x 400G ZR+ ports in one RU (1.5"), the hardware side is well equipped to support this architecture. As for software, the relative simplicity of Segment Routing, MPLS' successor, will tie the whole package together.
Layer 3 network routers behave much differently than optical systems but can provide the same functionality as a ROADM line system. The most notable ROADM functions the router would assume are pass-through traffic, path protection and circuit provisioning.
With a ROADM system, traffic would optically pass through a site. Now, traffic will pass-through as packets. When a path failure occurs, a ROADM system would physically switch to a standby optical path. Now, TI-LFA (topology independent loop-free alternate fast reroute) will reroute the traffic.
Finally, instead of creating optical circuits, we will use SRTE (segment routing traffic engineering) to create end-to-end services. In this section, we will discuss these three functions and the routing technology behind them.
What is Segment Routing? Segment Routing is the successor to MPLS (Multiprotocol Label Switching). Segment Routing leverages source routing by providing a simple, stateless mechanism to program the path a packet takes through the network to drive network scalability and network intelligence while improving capacity utilization and reducing costs. It accomplishes the same thing as MPLS but is far less complicated, easier to implement and much more robust.
Segment Routing is based on a source-routing architecture. It leverages source routing by providing a simple, stateless mechanism to program the path a packet takes through the network. Because the application has complete control over the forwarding path and steers the packet through the network by encoding an ordered list of segments in the packet header, there is no need for path signaling.
Therefore, Segment Routing does not create any per-flow state and can scale infinitely without any limitations. Segment Routing provides a scalable, streamlined method for provisioning services and managing the paths they use.
Without optically passing traffic through a site, a router must process every packet that traverses it. This would not be realistic in previous generations of routers considering the high cost of router ports, processing power and latency. This is no longer the case with today's core routers, as the price per port has dropped tremendously, and the processing power has increased exponentially.
Regarding latency, in traditional software-based routers, it was undoubtedly a concern. However, today's modern ASIC-based routers equate to a Layer 3 switch that can switch packets within 4-6 microseconds. This time includes the serialization delay of the packet, which decreases as link speeds increase.
For example, a 10Ge link is approximately 1.2 microseconds, but 100Ge is only 1/10th of that, or .12 microseconds. With the introduction of 400Ge and 800Ge in the not so distant future, packet transit latency becomes even less of a concern.
One thing to keep in mind, especially for layer 3 focused engineers, is the capacity planning implications. Figure 5 shows three sites with 64 wavelengths available between each site. Three of the 64 available waves are carrying 400G each, between each router, providing 1.2Tb of total capacity. When packets need to go from Router A to Router C, they have to traverse Site B's router, consuming bandwidth on Router B's interfaces. For the sake of numbers, let's say we need 400Gbps between Router A and Router C, which would consume a full 400G wave, or a third of our total capacity, and two physical 400G ports in Router B.
Now consider a real-world network with many more routers and various bandwidth demands, and the planning process becomes quite complicated. However, having the ability to deploy up to 64x 400G DWDM waves between adjacent routers and 800G DWDM on the roadmap, and with careful planning, large-scale hop-by-hop network architectures seem realistic.
Analyzing real-world networks with tens and hundreds of sites requires powerful modeling tools and is beyond this article's scope. Thankfully, Cisco has been crunching the numbers and presented some excellent examples at Cisco Live 2020. These examples include a hybrid approach where hop-by-hop is the primary transport architecture with optical pass-through used in select parts of the network.
We like the concept of selective optical pass-through and are glad it is being considered. We highly recommend checking out the presentation Converging IP and Optical Networks, Cisco Live 2020.
From SONET to DWDM, optical networks are the gold standard for providing highly reliable, sub-50 millisecond protection switching that most applications require. The routing methods available to reroute from a failed path to an alternate path either took too long or were too complicated to manage.
Before IP Fast Reroute (FRR), the IP layer would have to reconverge if the primary link was lost. The re-convergence process takes a long time because the routing protocol does not calculate the new path until a failure is detected. The routers have to converge, or learn the new path information from the routing protocol. This re-convergence process can take up to one second or longer, based on the network's size, to complete. MPLS-TE FRR is another option, but it has not been widely adopted due to excessive operational complexities.
Today we have TI-LFA. With TI-LFA, <50 millisecond failover times are easily achieved because TI-LFA has already determined the "new path," or protection path, in advance. If the primary link fails, the packets know where to go almost instantaneously, and the routers know what to do. With TI-LFA, the network is protected against failures at L1, L2 and L3, whether it be a fiber cut, interface failure or a change in routing protocol advertisements. Like the name implies, TI-LFA is topology independent, easily enabled, incrementally deployed and supported on any network running SR-MPLS or SRv6.
The TI-LFA protection path always uses the post-convergence path, which is the least cost path. The path's bandwidth capacity determines cost; the higher capacity, the lower the cost. Once the post-convergence path is calculated, it is stored in the routers forwarding information base (FIB).
In Figure 6, the primary path for Site A to Site Z is B → C. The post-convergence path saved in the FIB is B → E → F → C, based on a cost of 30, the sum of the three links. The other available path is B → D → C, but with a cost of 200, it is not the lowest cost path.
Instead of creating optical circuits, we will use SR-TE to create end-to-end services. RSVP-TE has existed for many years but is cumbersome and complicated at best. SR-TE takes the simplicity of Segment Routing to a new level by incorporating distributed and centralized control and optimization, with distributed intelligence.
We can program the end-to-end path of the packet directly into the packet header at the source, either via MPLS labels with SR-MPLS or directly into the IPv6 header when using SRv6. The application has complete control over the forwarding path and steers the packet through the network by encoding an ordered list of segments in the packet header.
SR-TE has a wide range of options, including the default high bandwidth (or shortest path), the low delay path or a high bandwidth path with a bound delay in conjunction with any user-defined constraints. SR-TE can create policies to forward traffic over these non-shortest paths instantiated based on these user-defined criteria. It also can be used to engineer Internet exit points enabling better bandwidth utilization by eliminating unnecessary backhaul traffic. Learn more about SR-TE.
All the optical and routing convergence puzzle pieces are here: small, power-efficient DWDM optics + port and processor dense routers and simple, programmable software — it all adds up. The benefits are clear, including less space and power, common hardware, simplified operations and high scalability. We're looking forward to the industry embracing this architectural evolution and are thankful we can be part of its growth.
We hope this article gave you a high-level understanding of what a converged optical and routing architecture looks like and the optical and routing mechanisms powering it. Feel free to contact us today with any questions or to discuss your specific use case.