Instructor: Pieter Abbeel Lecture 1 of the Deep RL Bootcamp held at Berkeley August 2017
Пікірлер: 45
@MsgrTeves6 жыл бұрын
This RL bootcamp is incredible.
@zenchiassassin2832 жыл бұрын
Some timestamps: - Exercise 1 : Effect of discount (factor/rate) and noise : at 32:41 - Exercise 2 : Policy evaluation with stochastic policy : at 45:22 - Policy Improvement Idea at 49:21 to 50:10 and 52:55 to 54:12 - Infinite actions : exact methods barely ever work : at 54:25
@sunderrajan61726 жыл бұрын
Great lecture. If the questions asked are repeated for playback, it will make it even better
@marloncajamarca27936 жыл бұрын
Awesome lecture!!!
@nathanfitzpatrick99534 жыл бұрын
This guy is not messing around.
@XinHeng6 жыл бұрын
This is an excellent lecture
@miyashitahikaru19526 жыл бұрын
Awesome lecture
@ProfessionalTycoons5 жыл бұрын
great video.
@bofeng69105 жыл бұрын
Great lecture +1
@babamam10256 жыл бұрын
Awesome lectures! Anyone knows where to download the slides?
@gaaligadu1485 жыл бұрын
Does anyone know if there are transcripts for these lectures ? I can't hear the student's questions especially
@kleemc6 жыл бұрын
Great lecture. Would be better if questions are repeated. We can only guess what the questions are.
@JadtheProdigy6 жыл бұрын
i never thought a UFC fighter would be watching this. props bro
@afrozenator5 жыл бұрын
Starts at 1:00
@shubhanshawasthi43195 жыл бұрын
At time 45:13, in the update equations(last 2 on that slide) isn't s' should be in place of s in gamma*V(k-1)(^pi)(s) and gamma*V(^pi)(s) ?
@ethanjyx5 жыл бұрын
Very well taught lecture!
@chaucao97255 жыл бұрын
53:30 poliception
@roboticsresources96806 жыл бұрын
Best lecture in Deep Reinforcement learning
@bajdoub5 жыл бұрын
Except there is no Reinforcement Learning in this lecture, only Markov Decision Process solving for optimal policy by value/policy iteration. So no Reinforcement Learning, and certainly no Deep Reinforcement Learning. Reinforcement Learning is an approach to solve MDP without knowing the model. Here the model is known.
@muratcan__224 жыл бұрын
nice lecture
@waleedalzamil2228Ай бұрын
how can I get the slides of this awsome bootcamp I still a student and I have been a while studying RL and getting the slides will help me more to refer directly to them when I forget something
@mingsumsze60268 ай бұрын
Thank you for the lecture. But I don’t get how the valuation of V in policy iteration can be solved by linear system of equations. It looks like unknowns (i.e. V) are on both side of the equation so the equations are nonlinear
@nicolorubattu98163 жыл бұрын
24:41
@johnhart17905 жыл бұрын
Great lecture. At 44:04 shouldn't the s in V^(pi)_(k-1) (s) be s'?
@coolmig5 жыл бұрын
I wonder the same.. ^_^
@emamulmursalin91814 жыл бұрын
Yes. The prime on the "s" is missing.
@bobsmithy31032 жыл бұрын
Yes, as it's the discounted value of the next/future state.
@wuzhai20096 жыл бұрын
Outstanding lecture. Very comparable to David Silver's lectures.
@volodscoi5 жыл бұрын
Which one would you recommend? This Bootcamp playlist or David Silver's lectures? Thank you in advance!
@bafrot4 жыл бұрын
@@volodscoi see this first then go to david silver
@elzilcho2226 жыл бұрын
At 20:50, isn't the V*(3,3) supposed to be V*(2,3)?
@SayanGHD6 жыл бұрын
Juna No, if you hit the wall, you stay at that state itself.
@HM-wn9on3 жыл бұрын
@@SayanGHD I can't understand why there are only three choices of the actions except going west(2,3) and why the probability of going north and south are 0.1.
@JensOO73 жыл бұрын
I think Juna made a good point, since it is more likely to include (2,3) and (3,2) as possible states, rather than considering walking into a wall and neglecting one possible move to (2,3). But still, I am not certain about it. EDIT: At 19:54 he explains it. 80% chance to go where you wanted to go, 10% right and left of said direction. So the robot will not go backwards. Therefore bumping into the walls as explained seems right.
@user-tg6jk6bo4y3 жыл бұрын
말 개 빠르네 진짜
@rajeev10715 жыл бұрын
Some more typos at various places. In the equation for policy iteration last term should contain S' and not S.
@bafrot4 жыл бұрын
exactly
@AndrewJongOnline4 жыл бұрын
Could you put this series in a KZfaq playlist, please?
@MyBlenderDay4 жыл бұрын
Here is the summary: sites.google.com/view/deep-rl-bootcamp/lectures
@roro47872 жыл бұрын
@@MyBlenderDay thanks
@Seff24 жыл бұрын
Bad lesson... So many Formulas with no hints what the terms all mean. From the Point "Policy Evaluation" I understood nothing anymore. Before I could follow because the graphs gave some understanding what its even about. But I dont even know what a policy is, and suddenly there are no Graphs, just plain formulas and unexplained termina. Started okay, but ended confusing.
@ronmedina4294 жыл бұрын
this just means the bootcamp is not for you
@alexanderskusnov51193 жыл бұрын
7 min: policy is choosing an action
@purelogic45335 жыл бұрын
Poor motivation in this lecture. The idea of using value iteration is in itself a lookback from achieving a goal. Hence the lookback is simply a step taken through an episodic path to determine which actions are best taken to achieve the goal one step back from the termination point. Now that gives rise to value iteration as the value is determined iteratively through the many steps to be taken to carve out the optimal path to be taken. Nevertheless a superb introduction!