No video

Reinforcement Learning: ChatGPT and RLHF

  Рет қаралды 9,833

Graphics in 5 Minutes

Graphics in 5 Minutes

Күн бұрын

Пікірлер: 16
@EternityUnknown
@EternityUnknown 2 ай бұрын
I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.
@Véry_9
@Véry_9 6 күн бұрын
PLEASE COMEBACK!! You are an amazing theacher!
@Coder.tahsin
@Coder.tahsin 2 ай бұрын
All of your videos are amazing, please upload more
@tuulymusic3856
@tuulymusic3856 4 ай бұрын
Please come back, your videos are great!
@HoverAround
@HoverAround 3 ай бұрын
Joel, excellent explanation and talk! Thank you!
@user-cm5es5kk7j
@user-cm5es5kk7j 3 ай бұрын
help me a lot, can't wait to see more
@ireoluwaTH
@ireoluwaTH Жыл бұрын
Welcome back! Hope to see more of these videos..
@pegasusbupt
@pegasusbupt 10 ай бұрын
Amazing content! Please keep them coming!
@jasonpmorrison
@jasonpmorrison 10 ай бұрын
Super helpful - thank you for this series!
@0xeb-
@0xeb- Жыл бұрын
Good teaching.
@RaulMartinezRME
@RaulMartinezRME Жыл бұрын
Great content!!
@stayhappy-forever
@stayhappy-forever 4 ай бұрын
come back :(
@vamsinadh100
@vamsinadh100 10 ай бұрын
You are the Best
@neo4242002
@neo4242002 2 ай бұрын
Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?
@onhazrat
@onhazrat Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI
@0xeb-
@0xeb- Жыл бұрын
How long it takes to train a reward network? And how reliable would it be?
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 55 М.
Large Language Models from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 343 М.
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 28 МЛН
Yum 😋 cotton candy 🍭
00:18
Nadir Show
Рет қаралды 7 МЛН
The Joker kisses Harley Quinn underwater!#Harley Quinn #joker
00:49
Harley Quinn with the Joker
Рет қаралды 13 МЛН
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 966 М.
Reinforcement Learning from Human Feedback Explained (and RLAIF)
9:08
What's AI by Louis-François Bouchard
Рет қаралды 2,3 М.
How ChatGPT is Trained
13:43
Ari Seff
Рет қаралды 520 М.
Reinforcement Learning:  AlphaGo
8:14
Graphics in 5 Minutes
Рет қаралды 13 М.
Why Large Language Models Hallucinate
9:38
IBM Technology
Рет қаралды 188 М.
How I’d learn ML in 2024 (if I could start over)
7:05
Boris Meinardus
Рет қаралды 1 МЛН
Proximal Policy Optimization | ChatGPT uses this
13:26
CodeEmporium
Рет қаралды 15 М.
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 28 МЛН