Reinforcement Learning: ChatGPT and RLHF

No video

Reinforcement Learning: ChatGPT and RLHF

Рет қаралды 9,833

Graphics in 5 Minutes

Күн бұрын

Пікірлер: 16

@EternityUnknown 2 ай бұрын

I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.

@Véry_9 6 күн бұрын

PLEASE COMEBACK!! You are an amazing theacher!

@Coder.tahsin 2 ай бұрын

All of your videos are amazing, please upload more

@tuulymusic3856 4 ай бұрын

Please come back, your videos are great!

@HoverAround 3 ай бұрын

Joel, excellent explanation and talk! Thank you!

@user-cm5es5kk7j 3 ай бұрын

help me a lot, can't wait to see more

@ireoluwaTH Жыл бұрын

Welcome back! Hope to see more of these videos..

@pegasusbupt 10 ай бұрын

Amazing content! Please keep them coming!

@jasonpmorrison 10 ай бұрын

Super helpful - thank you for this series!

@0xeb- Жыл бұрын

Good teaching.

@RaulMartinezRME Жыл бұрын

Great content!!

@stayhappy-forever 4 ай бұрын

come back :(

@vamsinadh100 10 ай бұрын

You are the Best

@neo4242002 2 ай бұрын

Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?

@onhazrat Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI

@0xeb- Жыл бұрын

How long it takes to train a reward network? And how reliable would it be?