Data Center Networking

Overview

Explore

Labs

Services

Events

Partners

29 results found

Using PFC and ECN queuing methods to create lossless fabrics for AI/ML

Widely available GPU-accelerated servers, combined with better hardware and popular programming languages like Python and C/C++, along with frameworks such as PyTorch, TensorFlow and JAX, simplify the development of GPU-accelerated ML applications. These applications serve diverse purposes, from medical research to self-driving vehicles, relying on large datasets and GPU clusters for training deep neural networks. Inference frameworks apply knowledge from trained models to new data with optimized clusters for performance. The learning cycles involved in AI workloads can take days or weeks, and high-latency communication between server clusters can significantly impact completion times or result in failure. AI workloads demand low-latency, lossless networks, requiring appropriate hardware, software features, and configurations. This article will explain advanced queueing solutions used by all the major OEMs in the Network Operating Systems (NOS) that support ECN and PFC.

Article

•Jun 25, 2024

6 Steps to Understanding Cisco ACI

When understood, these six concepts will help anyone new to ACI to understand a more detailed technical discussion.

Article

•Jul 28, 2025

Understanding Data Center Quantized Congestion Notification (DCQCN)

RoCEv2 is a solution for achieving swift data throughput and minimal delay in modern data centers. It incorporates features like Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) to establish a lossless network environment. PFC manages data flow at the interface level, while ECN detects and mitigates congestion before PFC activation becomes necessary. The combination of ECN and PFC, known as Data Center Quantized Congestion Notification (DCQCN), optimizes congestion management in RDMA networks. Careful tuning of queue thresholds is crucial to prevent hot spots and ensure low Job Completion Times (JCTs). The use of ECN and PFC is necessary for maintaining a lossless fabric in GPU-to-GPU communication during AI/ML training runs.

Article

•Jun 16, 2024

MP-BGP EVPN VXLAN for the beginner

The article below covers VXLAN encapsulation and how MP-BGP is used to learn and forward layer two and layer three traffic across the encapsulation. To better understand the content, we recommended that readers have prior knowledge of BGP and the MP-BGP routing protocol. For those unfamiliar with these concepts, we suggest reading our foundational learning path's "MP-BGP for the beginner" article beforehand. It's important to note that MP-BGP EVPN VXLAN may initially seem intimidating, but this beginner's guide will clearly understand how it works. It's a technology that has gained popularity over the last few years, with many companies adopting it.

Article

•Apr 23, 2025

Introduction to NVIDIA's AI/ML GPU networking solutions

This article discusses the importance of deploying AI applications and training models using distributed computing and the need for significant computational resources. It highlights the role of network efficiency and scalability in large-scale AI deployments.

Article

•Aug 6, 2024

Introduction to Arista's AI/ML GPU Networking Solution

AI workloads require significant data and computational power, with billions of parameters and complex matrix operations. Inter-network communication accounts for a significant portion of job completion time. Traditional network architectures are insufficient for large-scale AI training, necessitating investments in new network designs. Arista Networks offers high-bandwidth, low-latency and scalable connectivity for GPU servers, with features like Data Center Quantized Congestion Notification and intelligent load balancing. Arista's AI Leafs and Spines provide high-density and high-performance switches for AI networking. Different network designs are recommended based on the size of the AI application. A dedicated storage network is recommended to handle the large datasets used in AI training. Arista's Cloud Vision Portal and AI Analyzer tools provide automated provisioning and deep flow analysis. Arista's IP/Ethernet switches are well-suited for AI/ML workloads, offering energy-efficient interconnects and simplified network management.

Article

•Jun 25, 2024

Optical Data Center Interconnect: Connecting Your Data Centers With Private DWDM Technology

Optical Data Center Interconnect (DCI) provides a cost saving, high density, flexible alternative to leased circuits for connecting geographically separated data centers.

Article

•Aug 28, 2023

Use the NEXUS Dashboard Free Trial to Proactively Monitor Your ACI Fabric

Learn how to create a 90-day POC to verify fabric performance, troubleshoot issues and validate the usefulness of the NEXUS Dashboard and day 2 operations suite.

White Paper

•Apr 9, 2025

Segmenting Complex Environments Using Cisco ACI

ACI is a powerful technology offering rich features for SDN to include application-centric security segmentation, automation and orchestration in the data center.

White Paper

•Jun 3, 2023

The Risk of End of Support (EoS) Infrastructure in Your Data Center

This article examines what End of Support/End of Life means, how it can affect your business and steps to making a plan to refresh your data center.

Article

•May 2, 2023

The Future of Intent-based Networking and Multi-domain Architectures: Part I

This series of articles will explore what intent-based networking (IBN) is and how organizations can leverage it to build multi-domain architectures. As more organizations embrace these architectures, WWT is seeing a change in networking the likes which haven't been seen in many years.

Article

•Apr 9, 2025

The Future of Intent-based Networking and Multi-domain Architectures: Part III

The last in a series of articles exploring intent-based networking, this article focuses on the art of the possible as it pertains to linking all the disparate SDN domains.

Article

•Apr 9, 2025