Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

  Рет қаралды 69,079

Stanford Online

Stanford Online

Күн бұрын

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/2Zv1JpK
Topics: Reinforcement learning, Monte Carlo, SARSA, Q-learning, Exploration/exploitation, function approximation
Percy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor - Stanford University
onlinehub.stanford.edu/
Associate Professor Percy Liang
Associate Professor of Computer Science and Statistics (courtesy)
profiles.stanford.edu/percy-l...
Assistant Professor Dorsa Sadigh
Assistant Professor in the Computer Science Department & Electrical Engineering Department
profiles.stanford.edu/dorsa-s...
To follow along with the course schedule and syllabus, visit:
stanford-cs221.github.io/autu...

Пікірлер: 9
@henkjekel4081
@henkjekel4081 Жыл бұрын
Yeah, u really need to be having an episode to play this game
@aojing
@aojing 3 ай бұрын
A legacy question from last MDP-1 is still hovering around 2: What is the Transition function for this class? Is it a function of Action?
@inventwithdean
@inventwithdean Ай бұрын
It is a function of both State and Action.
@black-sci
@black-sci 4 ай бұрын
Somehow Lecture left me confused in the end. may be I should rewatch.
@JumbyG
@JumbyG Жыл бұрын
I think there may be a typo at 28:27, it states that the Qpi is (4+8+16)/3 however I believe it should be (4+8+12)/3? Please correct me if I am wrong
@seaotterlabs1685
@seaotterlabs1685 Жыл бұрын
I think it should be (4+8+16)/3, as I believe their last run has four 4 values.
@endoumamoru3835
@endoumamoru3835 6 ай бұрын
he is calculating sum of all rewards you can get. First time sum was 4 as only one reward was present and next was 8 as 2 rewards and then next it was 16 as 4 rewards were there.
@albert2266
@albert2266 2 ай бұрын
Just to clarify a concept as I think 7:29 is not true because value function shouldn't be equal to the Q value. Value function is the expected utility for "all possible actions" at a given state. Therefore, it should be the expected Q_pi rather than just simply equal to Q_pi since Q_pi is the expected utility for "a given action" at a given state. Please correct me if I'm wrong.
@Moriadin
@Moriadin Ай бұрын
not as good as the previous lecture. harder to follow.
Markov Decision Processes - Computerphile
17:42
Computerphile
Рет қаралды 161 М.
Always be more smart #shorts
00:32
Jin and Hattie
Рет қаралды 48 МЛН
Khó thế mà cũng làm được || How did the police do that? #shorts
01:00
NERF WAR HEAVY: Drone Battle!
00:30
MacDannyGun
Рет қаралды 52 МЛН
3M❤️ #thankyou #shorts
00:16
ウエスP -Mr Uekusa- Wes-P
Рет қаралды 14 МЛН
Stanford CS25: V4 I Hyung Won Chung of OpenAI
36:31
Stanford Online
Рет қаралды 130 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 347 М.
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,4 МЛН
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 29 М.
Andrew Ng: Opportunities in AI - 2023
36:55
Stanford Online
Рет қаралды 1,8 МЛН
Always be more smart #shorts
00:32
Jin and Hattie
Рет қаралды 48 МЛН