Skip to content

Arista Networks introduces Etherlink AI Platforms for optimised AI workloads

June 6, 2024
Arista Networks introduces Etherlink AI Platforms for optimised AI workloads

Arista Networks has announced the Arista Etherlink AI platforms, designed to deliver optimal network performance for the demanding AI workloads, including training and inferencing.

Powered by new AI-optimised Arista EOS features, the Arista Etherlink AI portfolio supports AI cluster sizes ranging from thousands to 100,000s of XPUs with highly efficient one and 2-tier network topologies that deliver application performance compared to more complex multi-tier networks while offering monitoring capabilities including flow-level visibility.

“The network is core to successful job completion outcomes in AI clusters,” said Alan Weckel, the founder and technology analyst for 650 Group. “The Arista Etherlink AI platforms offer customers the ability to have a single 800G end-to-end technology platform across front-end, training, inference and storage networks. Customers benefit from using the same well-proven Ethernet tooling, security and expertise they have relied on for decades while easily scaling up for any AI application.”

Arista’s Etherlink AI Platforms

  • The 7060X6 AI Leaf switch family employs Broadcom Tomahawk 5 silicon, with a capacity of 51.2 Tbps and support for 64 800G or 128 400G Ethernet ports.
  • The 7800R4 AI Spineis the 4th generation of Arista’s flagship 7800 modular systems. It implements the latest Broadcom Jericho3-AI processors with an AI-optimised packet pipeline and offers non-blocking throughput with the proven virtual output queuing architecture. The 7800R4-AI supports up to 460 Tbps in a single chassis, which corresponds to 576 800G or 1152 400G Ethernet ports.
  • The 7700R4 AI Distributed Etherlink Switch (DES) supports the largest AI clusters, offering customers parallel distributed scheduling and congestion-free traffic spraying based on the Jericho3-AI architecture. The 7700 represents the first in a new series of ultra-scalable, intelligent distributed systems that can deliver a consistent throughput for very large AI clusters.

A single-tier network topology with Etherlink platforms can support over 10,000 XPUs. With a 2-tier network, Etherlink can support more than 100,000 XPUs. Minimising the number of network tiers is essential for optimising AI application performance, reducing the number of optical transceivers, lowering cost and improving reliability.

All Etherlink switches support the emerging Ultra Ethernet Consortium (UEC) standards, which are expected to provide additional performance benefits when UEC NICs become available in the near future.

“Broadcom is a firm believer in the versatility, performance, and robustness of Ethernet, which makes it the technology of choice for AI workloads,” said Ram Velaga, the senior vice president and general manager of Core Switching Group at Broadcom. “By using industry-leading Ethernet chips such as Tomahawk 5 and Jericho3-AI, Arista provides the ideal accelerator-agnostic solution for AI clusters of any shape or size, outperforming proprietary technologies and providing flexible options for fixed, modular and distributed switching platforms.”

Arista EOS Smart AI Suite

The rich features of Arista EOS and CloudVision complement these new networking-for-AI platforms. The innovative software suite for AI-for-networking, security, segmentation, visibility and telemetry features brings AI-grade robustness and protection to high-value AI clusters and workloads. For example, Arista EOS’s Smart AI suite of enhancements now integrates with SmartNIC providers to deliver advanced RDMA-aware load balancing and QoS. Arista AI Analyzer powered by Arista AVA automates configuration and improves visibility and intelligent performance analysis of AI workloads.

“Arista’s competitive advantage consistently comes down to our rich operating system and broad product portfolio to address AI networks of all sizes,” said Hugh Holbrook, the chief development officer at Arista Networks. “Innovative AI-optimised EOS features enable faster deployment, reduce configuration issues and deliver flow-level performance analysis, and improve AI job completion times for any size AI cluster.”

Comment on this article via X: @IoTNow_