Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Рет қаралды 93,588

AI Prism

Күн бұрын

Instructor: Pieter Abbeel
Lecture 1 of the Deep RL Bootcamp held at Berkeley August 2017

Пікірлер: 45

@MsgrTeves 6 жыл бұрын

This RL bootcamp is incredible.

@zenchiassassin283 2 жыл бұрын

Some timestamps: - Exercise 1 : Effect of discount (factor/rate) and noise : at 32:41 - Exercise 2 : Policy evaluation with stochastic policy : at 45:22 - Policy Improvement Idea at 49:21 to 50:10 and 52:55 to 54:12 - Infinite actions : exact methods barely ever work : at 54:25

@sunderrajan6172 6 жыл бұрын

Great lecture. If the questions asked are repeated for playback, it will make it even better

@marloncajamarca2793 6 жыл бұрын

Awesome lecture!!!

@nathanfitzpatrick9953 4 жыл бұрын

This guy is not messing around.

@XinHeng 6 жыл бұрын

This is an excellent lecture

@miyashitahikaru1952 6 жыл бұрын

Awesome lecture

@ProfessionalTycoons 5 жыл бұрын

great video.

@bofeng6910 5 жыл бұрын

Great lecture +1

@babamam1025 6 жыл бұрын

Awesome lectures! Anyone knows where to download the slides?

@gaaligadu148 5 жыл бұрын

Does anyone know if there are transcripts for these lectures ? I can't hear the student's questions especially

@kleemc 6 жыл бұрын

Great lecture. Would be better if questions are repeated. We can only guess what the questions are.

@JadtheProdigy 6 жыл бұрын

i never thought a UFC fighter would be watching this. props bro

@afrozenator 5 жыл бұрын

Starts at 1:00

@shubhanshawasthi4319 5 жыл бұрын

At time 45:13, in the update equations(last 2 on that slide) isn't s' should be in place of s in gamma*V(k-1)(^pi)(s) and gamma*V(^pi)(s) ?

@ethanjyx 5 жыл бұрын

Very well taught lecture!

@chaucao9725 5 жыл бұрын

53:30 poliception

@roboticsresources9680 6 жыл бұрын

Best lecture in Deep Reinforcement learning

@bajdoub 5 жыл бұрын

Except there is no Reinforcement Learning in this lecture, only Markov Decision Process solving for optimal policy by value/policy iteration. So no Reinforcement Learning, and certainly no Deep Reinforcement Learning. Reinforcement Learning is an approach to solve MDP without knowing the model. Here the model is known.

@muratcan__22 4 жыл бұрын

nice lecture

@waleedalzamil2228 Ай бұрын

how can I get the slides of this awsome bootcamp I still a student and I have been a while studying RL and getting the slides will help me more to refer directly to them when I forget something

@mingsumsze6026 8 ай бұрын

Thank you for the lecture. But I don’t get how the valuation of V in policy iteration can be solved by linear system of equations. It looks like unknowns (i.e. V) are on both side of the equation so the equations are nonlinear

@nicolorubattu9816 3 жыл бұрын

24:41

@johnhart1790 5 жыл бұрын

Great lecture. At 44:04 shouldn't the s in V^(pi)_(k-1) (s) be s'?

@coolmig 5 жыл бұрын

I wonder the same.. ^_^

@emamulmursalin9181 4 жыл бұрын

Yes. The prime on the "s" is missing.

@bobsmithy3103 2 жыл бұрын

Yes, as it's the discounted value of the next/future state.

@wuzhai2009 6 жыл бұрын

Outstanding lecture. Very comparable to David Silver's lectures.

@volodscoi 5 жыл бұрын

Which one would you recommend? This Bootcamp playlist or David Silver's lectures? Thank you in advance!

@bafrot 4 жыл бұрын

@@volodscoi see this first then go to david silver

@elzilcho222 6 жыл бұрын

At 20:50, isn't the V*(3,3) supposed to be V*(2,3)?

@SayanGHD 6 жыл бұрын

Juna No, if you hit the wall, you stay at that state itself.

@HM-wn9on 3 жыл бұрын

@@SayanGHD I can't understand why there are only three choices of the actions except going west(2,3) and why the probability of going north and south are 0.1.

@JensOO7 3 жыл бұрын

I think Juna made a good point, since it is more likely to include (2,3) and (3,2) as possible states, rather than considering walking into a wall and neglecting one possible move to (2,3). But still, I am not certain about it. EDIT: At 19:54 he explains it. 80% chance to go where you wanted to go, 10% right and left of said direction. So the robot will not go backwards. Therefore bumping into the walls as explained seems right.

@user-tg6jk6bo4y 3 жыл бұрын

말 개 빠르네 진짜

@rajeev1071 5 жыл бұрын

Some more typos at various places. In the equation for policy iteration last term should contain S' and not S.

@bafrot 4 жыл бұрын

exactly

@AndrewJongOnline 4 жыл бұрын

Could you put this series in a KZfaq playlist, please?

@MyBlenderDay 4 жыл бұрын

Here is the summary: sites.google.com/view/deep-rl-bootcamp/lectures

@roro4787 2 жыл бұрын

@@MyBlenderDay thanks

@Seff2 4 жыл бұрын

Bad lesson... So many Formulas with no hints what the terms all mean. From the Point "Policy Evaluation" I understood nothing anymore. Before I could follow because the graphs gave some understanding what its even about. But I dont even know what a policy is, and suddenly there are no Graphs, just plain formulas and unexplained termina. Started okay, but ended confusing.

@ronmedina429 4 жыл бұрын

this just means the bootcamp is not for you

@alexanderskusnov5119 3 жыл бұрын

7 min: policy is choosing an action

@purelogic4533 5 жыл бұрын

Poor motivation in this lecture. The idea of using value iteration is in itself a lookback from achieving a goal. Hence the lookback is simply a step taken through an episodic path to determine which actions are best taken to achieve the goal one step back from the termination point. Now that gives rise to value iteration as the value is determined iteratively through the many steps to be taken to carve out the optimal path to be taken. Nevertheless a superb introduction!