Scheduled Ethernet Fabric for Large scale AI training cluster

  Рет қаралды 256

Open Compute Project

Open Compute Project

18 күн бұрын

Pengfei Huo, Sr. Network Architect - ByteDance
S. Kamran Naqvi, Chief Network Architect - Broadcom
Large-scale AI training clusters, hosting tens of thousands of GPUs, are designed to deliver unparalleled computational power for a variety of AI workloads. To fully unleash the power, a highly efficient network fabric that connects these GPUs is essential.
The fabric should support extensive GPU scale-out while maintaining excellence, handle diverse parallel workloads with efficient multi-tenancy and job segregation, be resilient against link failures or topology changes to reduce intervention for check-points, and be grounded in an open ecosystem for innovation and adaptability.
In this presentation, we will explain how the Scheduled fabric addresses the essential requirements. We will also talk about how ByteDance has benchmarked the fabric at their AI clusters, examining its actual performance, deployment plan and thoughts on broader collaboration in the community.

Пікірлер
EN141 Webinar: RoCE Introduction
16:57
Broadcom Inc.
Рет қаралды 5 М.
Interstellar Expansion WITHOUT Faster Than Light Travel
21:14
PBS Space Time
Рет қаралды 343 М.
He Threw A Banana Peel At A Child🍌🙈😿
00:27
Giggle Jiggle
Рет қаралды 11 МЛН
Normal vs Smokers !! 😱😱😱
00:12
Tibo InShape
Рет қаралды 63 МЛН
Surprise Gifts #couplegoals
00:21
Jay & Sharon
Рет қаралды 32 МЛН
I MADE A CARDBOARD SWING!#asmr
00:40
HAYATAKU はやたく
Рет қаралды 31 МЛН
The Six-Country Fight Over These Tiny, Terrible Islands
23:08
Wendover Productions
Рет қаралды 296 М.
How are Microchips Made?
27:48
Branch Education
Рет қаралды 145 М.
How good is the latest version of ChatGPT? | BBC News
23:16
BBC News
Рет қаралды 87 М.
Integrated Photonics for the AI Revolution - An Ecosystem Perspective
2:00:46
Advanced Photonics Coalition
Рет қаралды 2,2 М.
The Case for Computational Offload to CXL Memory Devices for AI Workloads
17:04
NEW GPT-4o: My Mind is Blown.
6:28
Joshua Chang
Рет қаралды 488 М.
François Chollet - Creating Keras 3
1:05:32
TensorFlow
Рет қаралды 12 М.
Generative AI and the outlook for the Data Center Industry
6:47
Iron Mountain Data Centers
Рет қаралды 2,2 М.
НЕ ПОКУПАЙ iPad Pro
13:46
itpedia
Рет қаралды 289 М.