Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

  Рет қаралды 432,957

Stanford Online

Stanford Online

Күн бұрын

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/3pUNqG7
Topics: MDP1, Search review, Project
Percy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor - Stanford University
onlinehub.stanford.edu/
Associate Professor Percy Liang
Associate Professor of Computer Science and Statistics (courtesy)
profiles.stanford.edu/percy-l...
Assistant Professor Dorsa Sadigh
Assistant Professor in the Computer Science Department & Electrical Engineering Department
profiles.stanford.edu/dorsa-s...
To follow along with the course schedule and syllabus, visit:
stanford-cs221.github.io/autu...
Chapters:
0:00 intro
2:12 Course Plan
3:45 Applications
10:48 Rewards
18:46 Markov Decision process
19:33 Transitions
20:45 Transportation Example
29:28 What is a Solution?
30:58 Roadmap
36:36 Evaluating a policy: volcano crossing
37:38 Discounting
53:21 Policy evaluation computation
55:23 Complexity
57:10 Summary so far
#artificialintelligencecourse

Пікірлер: 131
@nsubugakasozi7101
@nsubugakasozi7101 2 ай бұрын
This lecturer is world class...and this is also the most confident live coding I have seen in a while...she is really really good. Universities are made by the lecturers...not so much the name
@pirouzaan
@pirouzaan 10 ай бұрын
this was by far the most impressive lecture with live coding that I had seen! I am leaving this virtual lecture room with awe and respect...
@iiilllii140
@iiilllii140 Жыл бұрын
Thank you for this lecture and the course order. The past lectures about search problems really help you to better understand MDPs.
@vishalsunkapaka7247
@vishalsunkapaka7247 2 жыл бұрын
professor is so talented can’t say anything just feared over her, can’t take anymore
@kazimsyed7367
@kazimsyed7367 2 жыл бұрын
I wanna appreciate this lecture, its good. i had a difficult time and mental block for this topic. I wanna say thanks for all ur efforts.
@foufayyy
@foufayyy 2 жыл бұрын
thank you for posting this. MDPs were really confusing and this lecture really helped me understand it clearly.
@-isotope_k
@-isotope_k 2 жыл бұрын
Yes this is very very confusing topic
@user-bn3zw9sd1p
@user-bn3zw9sd1p Жыл бұрын
It was my n-th iteration of MDP -where n>10 but using terminology of of MDP my knowlege finnally started to converge to proper direction. Thank you for the lecture🙂
@muheedmir7385
@muheedmir7385 Жыл бұрын
Amazing lecture, loved every bit of it
@meharjeetsingh5256
@meharjeetsingh5256 8 ай бұрын
this teacher is really really good. I wish you were at my Uni so that i could enjoy machine learning
@chanliang5725
@chanliang5725 8 ай бұрын
I was lost on the MDP. Glad I find this awesome lecture clears all concepts in MDP! Very helpful!
@joshuat6124
@joshuat6124 3 ай бұрын
Thank you professor! I learnt to much from this, especially the live coding bits.
@yesodabhargava8776
@yesodabhargava8776 2 жыл бұрын
This is an awesome lecture! Thank you so much.
@quannmtt3110
@quannmtt3110 Жыл бұрын
Thanks for the awesome lecture. Very good job at explanation by the lecturer.
@sukhjinderkumar2723
@sukhjinderkumar2723 2 жыл бұрын
Great Lecture, Thank you Professor :)
@adityanjsg99
@adityanjsg99 Жыл бұрын
A thorough lecture!!
@snsacharya1737
@snsacharya1737 Жыл бұрын
At 29:36, a policy is defined as a one-to-one mapping from the state space to the action space; for example, the policy when we are in station-4 is to walk. This definition is different compated to the one made in the classic RL book by Sutton and Barto; they define a policy as "a mapping from states to probabilities of selecting each possible action." For example, the policy when we are in station-4 is a 40% chance of walking and 60% chance of taking the train. The policy evaluation algorithm that is presented in this lecture also ends up being slightly different by not looping over the possible actions. It is nice of the instructor to highlight that point at 55:45
@aojing
@aojing 4 ай бұрын
Action is determined from the beginning independent of states in this class...This will mislead beginners to confuse Q and V, as by this definition @47:20. In RL, we take action by policy, which is random and can be learned/optimized by iterating through episodes, i.e., parallel worlds.
@marzmohammadi8739
@marzmohammadi8739 2 жыл бұрын
لذت بردم خانم صدیق. کیف کردم .. مممنووونننن
@alphatensor
@alphatensor 8 ай бұрын
Thanks for the good lecture
@ammaraboklam2487
@ammaraboklam2487 2 жыл бұрын
Thank you very much This is really great lecture it's really helpful
@stanfordonline
@stanfordonline 2 жыл бұрын
Hi Ammar, glad it was helpful! Thanks for your feedback
@vimukthirandika872
@vimukthirandika872 2 жыл бұрын
Thank for amazing lecture!
@HarshvardhanKanthode
@HarshvardhanKanthode 2 жыл бұрын
Where are all the comments?
@carlosloria-saenz6760
@carlosloria-saenz6760 7 ай бұрын
Great videos, thanks!. At time 47:20 on the board a small typo, I guess it should be: V_{\pi}(s) = Q_{\pi}(s, \pi(s)) if s not the end state.
@farzanzeinali7398
@farzanzeinali7398 Жыл бұрын
The transportation example has a problem. The states are discrete. If you take the tram, the starting state equals 1, and with state*2, you will never end up in state=3. Let's assume the first action was successful, therefore, the next state is 2. If the second action is successful too, you will be end up in state = 4. you will never end up in state = 3.
@RojinaPanta1
@RojinaPanta1 11 ай бұрын
would not removing constraint increase search space making computationally inefficent?
@alemayehutesfaye463
@alemayehutesfaye463 Жыл бұрын
Thank you for your interesting lecture this lecture really helped me to understand it well.
@stanfordonline
@stanfordonline Жыл бұрын
Hi Alemayehu, thanks for your comment! Nice to hear you enjoyed this lecture.
@alemayehutesfaye463
@alemayehutesfaye463 Жыл бұрын
@@stanfordonline Thanks for your reply. I am following you from Ethiopia and had interest on the subject area. Would you mind in suggesting best texts and supporting video's which may be helpful to have in-depth knowledge in the areas of Markov Processes and decision making specially related to manufacturing industries?
@seaotterlabs1685
@seaotterlabs1685 Жыл бұрын
Amazing lecture! I was having trouble finding my footing on this topic and now I feel I have a good starting point of the concepts and notations! I hope Professor Sadigh teaches many more AI topics!
@stanfordonline
@stanfordonline Жыл бұрын
Excellent, thanks for your feedback!
@ibenlhafid
@ibenlhafid Жыл бұрын
Mm
@ibenlhafid
@ibenlhafid Жыл бұрын
Mmmm
@ibenlhafid
@ibenlhafid Жыл бұрын
Pp
@ibenlhafid
@ibenlhafid Жыл бұрын
09
@thalaivarda
@thalaivarda 2 жыл бұрын
I will be conducting a test for those watching the video.
@karimdarwich1913
@karimdarwich1913 20 күн бұрын
How can I choose the "right" gamma for my problem? Like how can I know that the gamma I choose is good or not ?
@camerashysd7165
@camerashysd7165 2 ай бұрын
Wow this account crazy 😮
@eigenfeynman9890
@eigenfeynman9890 2 жыл бұрын
FYI I'm a theoretical physics major, and I have no business in CS and whatsoever
@vikasshukla831
@vikasshukla831 2 жыл бұрын
Can in the Dice Game If choose to stay for the step 1 and then quit in the second stage: will I get 10 dollars if I choose to quit in the stage 2? Because If I am lucky enough to go to second stage i.e the dice doesn't roll 1,2 then I am in the "In" state and by the diagram I have option to quit which might give me 10 dollar but for that I should have success in stage 1. Then the best strategy might change. Let know what are your comments?
@fahimullahkhan775
@fahimullahkhan775 Жыл бұрын
You are right according to the figure and flow of the states, but from the scenario ones get the perception that ones has a chance to either quit at the start or stay in the game.
@msfallah
@msfallah Жыл бұрын
I think the given definition for value-action function (Q(s, action)) is not correct. In fact value function is the summation of value-action functions over all actions.
@aojing
@aojing 4 ай бұрын
@47:20 the definition of Q function is not right and confuses with Value function. Specifically, take immediate reward R out of summation. The reason is Q function is to estimate the value of a specific Action beginning with current State.
@aojing
@aojing 4 ай бұрын
or we may say the Value function here is not properly defined without considering policy, i.e., by taking action independent of states.
@rahulkelkar1246
@rahulkelkar1246 2 жыл бұрын
Does anyone think she look like Zoe Kazan?
@pythonmini7054
@pythonmini7054 Жыл бұрын
Is it me or she looks like callie torres from grays anatomy 🤔
@henkjekel4081
@henkjekel4081 Жыл бұрын
U should look at andrew ng's lecture, he explains it way better
@dungeon1163
@dungeon1163 2 жыл бұрын
Only watching for educational purposes
@-isotope_k
@-isotope_k 2 жыл бұрын
😂😂
@mango-strawberry
@mango-strawberry Ай бұрын
😂😂. You know it.
@divyanshuy007
@divyanshuy007 Жыл бұрын
16:42 thumbnail
@HolyRamanRajya
@HolyRamanRajya 2 жыл бұрын
Beauty and brainy.
@aswinbiju4038
@aswinbiju4038 2 жыл бұрын
Only watching for educational purposes.
@soham4741
@soham4741 2 жыл бұрын
yes me too
@vikranthrana3019
@vikranthrana3019 2 жыл бұрын
Me too
@radheshyamshaw8672
@radheshyamshaw8672 2 жыл бұрын
Me too
@md.naimul8544
@md.naimul8544 7 ай бұрын
why is she so beautiful 😳😳
@ameerhamza4816
@ameerhamza4816 6 ай бұрын
Why not?
@chamangupta4624
@chamangupta4624 2 жыл бұрын
637
@harshraj3344
@harshraj3344 Жыл бұрын
My man
@saisriteja5290
@saisriteja5290 Жыл бұрын
i love you
@sachinfulsunge9977
@sachinfulsunge9977 Жыл бұрын
Hell naw bruh
@vikranthrana3019
@vikranthrana3019 2 жыл бұрын
Professor is quite cute ❤️
@Naentrikakudapikalev
@Naentrikakudapikalev 2 жыл бұрын
Cute lecture by cute lady
@asawriter-f1v
@asawriter-f1v Жыл бұрын
I'm Indian and belongs to Bihar State 🇮🇳🇮🇳
@mango-strawberry
@mango-strawberry Ай бұрын
No one cares. Get lost.
Пранк пошел не по плану…🥲
00:59
Саша Квашеная
Рет қаралды 6 МЛН
Как бесплатно замутить iphone 15 pro max
00:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
Inside Out Babies (Inside Out Animation)
00:21
FASH
Рет қаралды 15 МЛН
Markov Decision Processes
43:18
Bert Huang
Рет қаралды 75 М.
Inside Mark Zuckerberg's AI Era | The Circuit
24:02
Bloomberg Originals
Рет қаралды 1,3 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 353 М.
Andrew Ng: Opportunities in AI - 2023
36:55
Stanford Online
Рет қаралды 1,8 МЛН
Markov Chains Clearly Explained! Part - 1
9:24
Normalized Nerd
Рет қаралды 1,1 МЛН
Markov Decision Processes - Computerphile
17:42
Computerphile
Рет қаралды 163 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 410 М.
Пранк пошел не по плану…🥲
00:59
Саша Квашеная
Рет қаралды 6 МЛН