Bill Dally | Directions in Deep Learning Hardware

  Рет қаралды 6,702

Georgia Tech ECE

Georgia Tech ECE

Ай бұрын

Bill Dally
, Chief Scientist and Senior Vice President of Research at NVIDIA gives an ECE Distinguished Lecture on April 10, 2024 at Georgia Tech.
Abstract:
“Directions in Deep Learning Hardware”
The current resurgence of artificial intelligence, including generative AI like ChatGPT, is due to advances in deep learning. Systems based on deep learning now exceed human capability in speech recognition, object classification, and playing games like Go. Deep learning has been enabled by powerful, efficient computing hardware. The algorithms used have been around since the 1980s, but it has only been in the last decade - when powerful GPUs became available to train networks - that the technology has become practical.
Advances in DL are now gated by hardware performance. In the last decade, the efficiency of DL inference on GPUs had improved by 1000x. Much of this gain was due to improvements in data representation starting with FP32 in the Kepler generation of GPUs and scaling to Int8 and FP8 in the Hopper generation.
This talk will review this history and discuss further improvements in number representation including logarithmic representation, optimal clipping, and per-vector quantization.
BIOGRAPHY:
Bill Dally joined NVIDIA in January 2009 as chief scientist, after spending 12 years at Stanford University, where he was chairman of the computer science department. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today.
Dally was previously at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at California Institute of Technology (CalTech), where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered “wormhole” routing and virtual-channel flow control.
He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts & Sciences, a Fellow of the IEEE and the ACM, and has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM Maurice Wilkes award. He has published over 250 papers, holds over 120 issued patents, and is an author of four textbooks.
Dally received a bachelor's degree in Electrical Engineering from Virginia Tech, a master’s in Electrical Engineering from Stanford University and a Ph.D. in Computer Science from CalTech. He was a cofounder of Velio Communications and Stream Processors.

Пікірлер: 6
@fabianwenger7133
@fabianwenger7133 Ай бұрын
“Pick some domain that is ripe for acceleration and do the hardware-software co-optimization”. A job well done and detailed insight in a down-to-earth manner.
@Wobbothe3rd
@Wobbothe3rd Ай бұрын
This man deserves a congressional medal of freedom award.
@radicalrodriguez5912
@radicalrodriguez5912 29 күн бұрын
great hosting, talk and questions. thanks for uploading it
@gesitsinggih
@gesitsinggih 24 күн бұрын
A lot of useful information, but he is focusing on inference compute density, while the actual bottleneck is dram bandwidth. You will hardly get 10% inference compute utilization on the best hardware, even when maxing out practical batch size. Headline flops number is eye catching, but they have to be more honest about real usage.
@BlockDesignz
@BlockDesignz 6 күн бұрын
Wrong. He's talking about in a serving setting, where you'll have N users querying your service at any one time. If N is large enough (I'm talking 10^3), the problem becomes compute bounded again!
@gesitsinggih
@gesitsinggih 6 күн бұрын
@@BlockDesignz True, but in practice no one has large enough batch size and compute bounded. My critique is they grew compute way more than they grew memory bandwidth.
Computers Without Memory - Computerphile
8:52
Computerphile
Рет қаралды 334 М.
Nvidia CEO to Intel: No settlement
5:03
CNN Business
Рет қаралды 61 М.
Do you have a friend like this? 🤣#shorts
00:12
dednahype
Рет қаралды 43 МЛН
Joven bailarín noquea a ladrón de un golpe #nmas #shorts
00:17
小路飞姐姐居然让路飞小路飞都消失了#海贼王  #路飞
00:47
路飞与唐舞桐
Рет қаралды 94 МЛН
КАРМАНЧИК 2 СЕЗОН 6 СЕРИЯ
21:57
Inter Production
Рет қаралды 363 М.
Nvidia CUDA in 100 Seconds
3:13
Fireship
Рет қаралды 1 МЛН
Generative AI - Practical Use Cases
0:51
Architect IT Cloud
Рет қаралды 68 М.
9 Retirement Planning Mistakes You May Be Making
18:42
Rob Berger
Рет қаралды 40 М.
ML Engineering is Not What You Think - ML jobs Explained
13:23
Boris Meinardus
Рет қаралды 71 М.
CUDA Explained - Why Deep Learning uses GPUs
13:33
deeplizard
Рет қаралды 223 М.
All Industries Want an ECE Major
1:34
Georgia Tech ECE
Рет қаралды 850
Hall of Fame Tribute Video-Dr. Bill Dally
5:30
SVECorg
Рет қаралды 715
Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88
1:16:48
Stanford MLSys Seminars
Рет қаралды 3,3 М.
Do you have a friend like this? 🤣#shorts
00:12
dednahype
Рет қаралды 43 МЛН