OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

  Рет қаралды 35,195

Aleksa Gordić - The AI Epiphany

Aleksa Gordić - The AI Epiphany

Күн бұрын

❤️ Become The AI Epiphany Patreon ❤️
/ theaiepiphany
👨‍👩‍👧‍👦 Join our Discord community 👨‍👩‍👧‍👦
/ discord
In this video I cover Whisper, an ASR system from OpenAI's "Robust Speech Recognition via Large-Scale Weak Supervision" paper.
Trained on a huge multi-lingual, multi-task weakly supervised dataset it achieves a very high effective robustness and accuracy closing the gap with the human baseline using only an off-the-shelf transformer.
I walk you through both the paper as well as the actual code. Let me know whether the code part helped!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Paper: cdn.openai.com/papers/whisper...
✅ Code: github.com/openai/whisper
✅ Nice explanation of mel spectrograms: • Mel Spectrograms Expla...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00:00 Intro
00:02:05 Paper overview
00:07:30 Collecting a large scale weakly supervised dataset
00:13:55 Evaluation metric issues (WER)
00:16:05 Effective robustness
00:18:40 Scaling laws in progress
00:26:30 Decoding is hacky
00:28:30 Code walk-through
00:30:25 Model architecture (diagram vs code)
00:33:30 Transcription task
00:34:10 Loading the audio, mel spectrograms
00:37:50 Language detection
00:45:00 Transcription task continued
00:47:35 Suppressing token logits
00:52:00 Voice activity detection
00:53:35 Decoding and heuristics
01:01:56 Outro
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!
The AI Epiphany - / theaiepiphany
One-time donation - www.paypal.com/paypalme/theai...
Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💼 LinkedIn - / aleksagordic
🐦 Twitter - / gordic_aleksa
👨‍👩‍👧‍👦 Discord - / discord
📺 KZfaq - / theaiepiphany
📚 Medium - / gordicaleksa
💻 GitHub - github.com/gordicaleksa
📢 AI Newsletter - aiepiphany.substack.com/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#whisper #openai #asr

Пікірлер: 58
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | ML Coding Series
1:40:36
How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)
1:11:46
Aleksa Gordić - The AI Epiphany
Рет қаралды 18 М.
Sigma girl and soap bubbles by Secret Vlog
00:37
Secret Vlog
Рет қаралды 14 МЛН
路飞太过分了,自己游泳。#海贼王#路飞
00:28
路飞与唐舞桐
Рет қаралды 34 МЛН
WORLD'S SHORTEST WOMAN
00:58
Stokes Twins
Рет қаралды 83 МЛН
Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
1:22:58
Aleksa Gordić - The AI Epiphany
Рет қаралды 24 М.
The most important AI trends in 2024
9:35
IBM Technology
Рет қаралды 230 М.
Ultimate Guide to Diffusion Models | ML Coding Series | Denoising Diffusion Probabilistic Models
1:28:56
OpenAI CLIP: ConnectingText and Images (Paper Explained)
48:07
Yannic Kilcher
Рет қаралды 126 М.
AudioGen: Textually Guided Audio Generation | Text To Audio | Paper Explained
36:38
Aleksa Gordić - The AI Epiphany
Рет қаралды 7 М.
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
28:10
Efficient NLP
Рет қаралды 5 М.
OpenAI Whisper Demo: Convert Speech to Text in Python
4:59
Rob Mulla
Рет қаралды 102 М.
End-to-End Adversarial Text-to-Speech (Paper Explained)
40:49
Yannic Kilcher
Рет қаралды 14 М.
Sigma girl and soap bubbles by Secret Vlog
00:37
Secret Vlog
Рет қаралды 14 МЛН