In the fast-paced world of network infrastructure, few technologies have proven as transformative as Segment Routing over IPv6 (SRv6). What started as a means to simplify service provider networks and support 5G rollouts has now become important for handling today's most challenging artificial intelligence (AI) workloads. This exciting evolution-from overcoming traditional networking challenges to driving cutting-edge AI networks-showcases not only the remarkable flexibility of SRv6 but also its pivotal role in redefining the future of network architecture. As we embrace this new frontier, SRv6 stands at the forefront, enabling innovations that will shape the way we design AI infrastructures.
The genesis of SRv6: A quest for network simplification
Since 2012, Cisco has been at the forefront of pioneering Segment Routing, helping pave the way for SRv6, which began to take shape around 2016. This era marked a pivotal moment in the industry as it recognized the urgent need for a more agile and programmable network infrastructure capable of accommodating the demands of emerging technologies such as 5G, Internet of Things (IoT), and cloud services. The SRv6 network programming model was first introduced at the Internet Engineering Task Force (IETF) in March 2017, heralding the onset of an ecosystem that has since expanded rapidly across various industries.
A key driver behind SRv6 was the aspiration to simplify network operations by harnessing the inherent capabilities of IPv6. In contrast to its predecessor, Segment Routing Multiprotocol Label Switching (SR-MPLS), which still depended on the MPLS data plane, SRv6 sought to operate exclusively within the IPv6 framework, thereby eliminating the complexities associated with multiprotocol environments.
Cisco played a key role in the early development of SRv6 by promoting its standardization at the IETF. This effort resulted in important standards such as RFC 8402 (Segment Routing Architecture), RFC 8754 (Segment Routing Header), and RFC 8986 (SRv6 Network Programming), which established the foundation for the technology. In 2019, Cisco introduced the concept of SRv6 uSID (microsegment), enabling large-scale deployments while ensuring compatibility with older equipment.
SRv6 and the 5G revolution
The initial driver for SRv6 adoption was clear: The telecommunications industry needed a solution that could meet the stringent requirements of 5G networks. Traditional mobility management executed through GPRS Tunneling Protocol (GTP) created complex overlay tunneling architectures that didn't scale to 5G requirements-increased numbers of connected devices, ultra-low latency demands, network slicing capabilities, and mobile edge computing. The 3rd Generation Partnership Project (3GPP) officially initiated a study item titled "Study on User Plane Protocol in 5GC" to seek possible candidates for the next user-plane protocol, with SRv6 emerging as a compelling alternative.
What made SRv6 particularly attractive for 5G was its ability to simplify the network stack while enhancing capabilities. By leveraging IPv6's address space to provide network programmability, SRv6 enabled operators to compose data paths in the end-to-end IPv6 layer, integrating traffic engineering, VPNs, and service chaining features without the complexity of maintaining per-session tunnel states. Network resources-even wavelengths in dense wavelength division multiplexing (DWDM) systems-could be represented as IPv6 addresses, allowing control planes to program data paths that met specific application requirements.
Rapid adoption across service provider networks
Major communications service providers (CSPs) have embraced SRv6 and many more are considering doing so.
Figure 1: Across the globe, hundreds of SRv6 projects have been deployed or are in the testing or planning phases
These deployments demonstrate the flexibility of SRv6 across various applications:
- Simplified VPN services: SRv6 makes it easier to deploy and manage network services like L3VPNs, even across different networks. Only the entry and exit routers need to support SRv6, while the main routers can just forward standard IPv6 traffic. This streamlines network operations and lowers overhead.
- Service function chaining (SFC): SRv6 allows network functions, like firewalls and load balancers, to be included directly in routing paths. This means you can manage traffic without complicated additional protocols.
- Traffic engineering (TE) and fast reroute (FRR): SRv6 gives network operators fine control over traffic routes, helping to meet performance goals like low latency or bandwidth guarantees.
- Operational simplicity and cost reduction: By using only the IPv6 framework, SRv6 minimizes the reliance on various overlay protocols, resulting in a simpler network. This leads to easier troubleshooting and lower operational costs.
- Enhanced scalability and aggregation: SRv6 uses the scalability of IPv6, making it possible to manage large networks with fewer prefixes, which simplifies routing and boosts efficiency.
The AI infrastructure challenge: A new frontier
As SRv6 technology advanced in service provider networks, a significant transformation was also taking place in data centers. The rapid growth of AI-and especially the rise of large-scale model training-created networking demands that are fundamentally different from traditional workloads. AI training workloads scale to incredible levels, involving thousands or even tens of thousands of graphics processing units (GPUs) operating simultaneously. Unlike traditional data center traffic patterns, which consist of diverse and independent transactions, AI training workloads intensify the long-standing "elephant flow" challenge. While elephant flows have existed in big data shuffles, IP storage, and high-performance computing (HPC), AI training creates demanding patterns: thousands of tightly synchronized GPUs executing collective communication operations (all-reduce, all-gather) at every training step, generating massive, simultaneous data transfers where any straggler delays the entire cluster.
This synchronized behavior creates critical challenges that traditional networking approaches struggle to address:
- Bursty traffic and congestion spikes: When thousands of GPUs simultaneously push data along the same paths, sudden, intense congestion spikes can occur. While Explicit Congestion Notification (ECN) remains important for managing congestion reactively, without proactive traffic placement these mechanisms can be overwhelmed, potentially causing head-of-line blocking that spreads congestion across the network.
- The "slowest packet" problem: AI network performance is dictated by the slowest packet, not averages. When thousands of GPUs wait for a single straggler packet, even slight latency increases can significantly impact job completion time (JCT). Every microsecond and every dropped packet matters.
- Scale-across complexity: As AI infrastructure extends beyond individual data centers, organizations face network domain fragmentation, state scalability challenges at geographic scale, dynamic WAN conditions, and operational complexity spanning multiple protocol domains.
SRv6 in AI: The natural evolution
The networking community recognized that the same principles that made SRv6 successful in 5G networks-stateless operation, source-driven path control, and unified IPv6-based architecture-could address AI infrastructure challenges.
Backend GPU fabric optimization employs various congestion management strategies. Adaptive routing and flowlet load balancing are actively deployed at hyperscalers and neoclouds, providing dynamic traffic distribution based on real-time network conditions. SRv6's uSID offers an alternative approach through deterministic path placement for remote direct memory access (RDMA) traffic. By using a deep integration between AI workloads and SRv6, network interface controllers (NICs) can leverage source routing to perform stateless, predictable path placement-explicitly distributing traffic from different sources across available paths. This deterministic approach complements reactive techniques such as ECN by enabling proactive traffic placement that can reduce the frequency and severity of congestion events. Additionally, SRv6's explicit path encoding simplifies failure recovery: When congestion or failures arise, new paths can be encoded at the source without relying on distributed routing convergence, allowing for rapid traffic flow adjustments.
Furthermore, in the realm of frontend network unification, AI frontend networks must handle a variety of traffic types, including large checkpoint writes to distributed storage, telemetry streams, control plane messages, and user access. Each of these traffic types has unique performance requirements. SRv6 offers a unified framework for implementing quality of service (QoS), security policies, and traffic steering across both backend and frontend domains. This streamlining eliminates the complexity associated with managing different policy frameworks, allowing for greater efficiency in network management.
Additionally, SRv6 facilitates scale-across architecture enablement by removing the traditional fragmentation between data center and WAN domains, which leads to the creation of unified IPv6-based data planes. Organizations can apply consistent policies for managing AI traffic, whether it traverses local fabrics, frontend networks, or spans vast distances between data centers. With SRv6, a single segment list can encode paths from source GPUs through the complete infrastructure to destination GPUs located in remote data centers. Unlike Resource Reservation Protocol Traffic Engineering (RSVP-TE) or Multiprotocol Label Switching Traffic Engineering (MPLS-TE), which depend on maintaining per-flow state on network devices, SRv6 incorporates all routing instructions directly within packet headers. This approach eliminates state explosion, making it particularly beneficial for scale-across scenarios.
A number of hyperscalers began innovatively using SRv6 in their AI backend networks to provide fine-grained network path control, maximize network utilization, and deliver excellent fabric resiliency. At Open Source Summit Europe 2025, Cisco and Microsoft showcased how SRv6 in SONiC enables a wide range of data center use cases including AI backend.
The path forward
The journey of SRv6, from its origins in service provider networks to its promising role in AI infrastructure, illustrates a fundamental truth: Strong architectural principles transcend specific use cases. The stateless operations, source-driven control, and unified IPv6 framework that simplified 5G networks are the same principles that enable deterministic performance in AI fabrics and seamless connectivity across geographic boundaries.
As AI continues to expand-from single-cluster deployments to large-scale architectures spanning continents-the networking challenges will only grow. Training sessions that involve hundreds of thousands of GPUs distributed across multiple data centers will demand network infrastructure capable of maintaining microsecond-level precision on a global scale.
SRv6's inherent flexibility and extensibility allow it to adapt to these changing needs. Its programmability enables the introduction of new network functions and traffic engineering capabilities without requiring fundamental architectural changes. As new AI communication patterns emerge, SRv6 provides a robust networking foundation to support them.
The technology that simplified 5G mobile networks, enabled network slicing, and streamlined service provider operations is now the same technology ensuring that AI infrastructure can scale without limits. Since its first demonstrations in 2017, SRv6 has proven itself not just as a networking protocol but as a fundamental building block for the future of digital infrastructure. As organizations develop the next generation of AI systems, SRv6 will serve as a powerful yet unobtrusive engine, helping ensure that the network remains an enabler of innovation rather than a bottleneck. The journey from 5G to AI is just the beginning; the architecture is well positioned for whatever comes next.
Dive deeper into SRv6 and segment routing with helpful tutorials, scientific papers, and more
Related Blog Posts
IP Is Better Than Ever with Integrated Performance Measurement
IP Is Better Than Ever with SRv6 uSID
