MIT 6.S191: Reinforcement Learning

Рет қаралды 25,133

Күн бұрын

MIT Introduction to Deep Learning 6.S191: Lecture 5
Deep Reinforcement Learning
Lecturer: Alexander Amini
2024 Edition
For all lectures, slides, and lab materials: introtodeeplearning.com
Lecture Outline:
0:00 - Introduction
2:20 - Classes of learning problems
6:33 - Definitions
12:30 - The Q function
17:29 - Deeper into the Q function
23:12 - Deep Q Networks
30:36 - Atari results and limitations
34:24 - Policy learning algorithms
39:31 - Discrete vs continuous actions
43:21 - Training policy gradients
49:10 - RL in real life
51:33 - VISTA simulator
53:24 - AlphaGo and AlphaZero and MuZero
58:58 - Summary
Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!

Пікірлер: 21

@izharulhaq2436 29 күн бұрын

One of the best intro to RL. Recommended to every student interested in this field to watch this amazing lecture. I have just completed it at 1:40 AM...Now waiting for Actor-Critic Type RL Agent to be released soon...Thanks and Good night.

@visheshphutela 29 күн бұрын

Babe wake up new 6.S191 lecture just dropped

@BheezHandle 29 күн бұрын

Lol...

@VisatoVino 28 күн бұрын

@@BheezHandle Feel the vibessssss

@Asif-fp8gy 23 күн бұрын

Awesome job. Only curious if someone can explain how was the target part of the loss function computed at 26:40?

@artukikemty 29 күн бұрын

Amazing intro to the subject. Since it is interrelated to control theory it is mandatory to have a good back ground on control theory such as state space models and optimal control

@artukikemty 29 күн бұрын

Transformers can be used as a direct replacement for DRL since it can process sequences as well. There is an article in medium related to this alternative.

@gamalieliissacnyambacha3029 29 күн бұрын

I'm curious to listen to this lecture. I need more concepts to apply in my Thesis. I'm looking forward to seeing this happen soon.

@anoopitiss 26 күн бұрын

Following since 3 years

@hrishabhg 28 күн бұрын

Lovely lecture.❤ Self driving car is a dynamic environment as compared to Gaming environment. It may be mentioned.

@melvinkuriakose2708 5 күн бұрын

10:30 equation for total reward should be summation of rewards from t=0 to t=t, right? But in equation its from t to infinity...why?

@foregroundtreble05 29 күн бұрын

Needed u

@Crashrapescrypto 20 күн бұрын

can you advise for my startup, we applied for YC, we want to setup up indian team and RLHF as well as using SIMPO to agentify the hospital system and remove the inefficiences faced in the current hospital systems. im an aussie coming to america. we have hardware as well, been in guangzhou for the last 6 weeks finding the best containers and cameras triend to train for guaging container volume for measuring stock remaining.

@TheNewton 20 күн бұрын

Please repeat questions, question askers audio is blown out or intelligible. Some of the questions manage to be in the captions others but not all. The professors mic is perfect however with a great mix one of the few series where you don't have to be max volume all the time.