No video

This is the Math You Need to Master Reinforcement Learning

  Рет қаралды 9,473

ritvikmath

ritvikmath

Күн бұрын

Пікірлер: 27
@lial4633
@lial4633 4 ай бұрын
The best explanation of Policy Gradient methods I've seen!
@xnairegodking
@xnairegodking Ай бұрын
wow.. Best explanation ever. I think if you made a course of this it would be the best out there . Thanks for sharing
@drewgrant1605
@drewgrant1605 9 күн бұрын
Just subscribed! I love the level you teach at in your videos. It’s slightly above the level of Statsquest but not too dense that I need to mentally prepare before watching. (No shade to Statsquest, two random events can be independently great).
@sharks1349
@sharks1349 10 ай бұрын
I've been trying to understand Reinforcement learning and policy gradient methods always tripped me up. Thank you for making this video
@souravdey1227
@souravdey1227 2 ай бұрын
Can you please make a full playlist on reinforcement learning. No one explains the math stuff as simply as you do. Also please do a separate video going into greater mathematical detail proving the theorem, kind of like numberphile2
@avandfardi
@avandfardi 4 ай бұрын
What a beautiful explanation. Thank you
@ritvikmath
@ritvikmath 4 ай бұрын
You are very welcome
@matthewchunk3689
@matthewchunk3689 10 ай бұрын
Great summary! As good as LLMs are answering question, we still need smart people like you to get us thinking of the right questions.
@ritvikmath
@ritvikmath 10 ай бұрын
Thanks!
@tantzer6113
@tantzer6113 10 ай бұрын
LLMs are pretty bad at answering questions.
@pushkarparanjpe
@pushkarparanjpe 9 ай бұрын
Awesome explanation! Thanks.
@Mars.2024
@Mars.2024 3 ай бұрын
Thanks A million 🎉
@buumschakalaka4425
@buumschakalaka4425 10 ай бұрын
Thanks for the great video 💪👏 will there be more RL videos coming? E.g I would like to understand more how to set up reward functions. How do I weight rewards from different actions against each other? Also how would we set up the environment in the model based approach? And more
@HemantPoonia-wq8hr
@HemantPoonia-wq8hr 10 ай бұрын
Hey can you please upload videos on causal analysis or can you suggest some books to get started with it
@matteogirelli1023
@matteogirelli1023 10 ай бұрын
You mean causal inference? I suggest you to refer to econometrics textbooks, as in Economics we are pretty strong on that. - "Mostly harmless econometrics" by Pishke for a graduate level in applied stats (pure stats would find it undergrad level) - "Econometric analysis of cross-section and panel data" by Wooldridge
@HemantPoonia-wq8hr
@HemantPoonia-wq8hr 10 ай бұрын
@@matteogirelli1023 what do you suggest for someone who want to causal inference to my domain of problem taht is climate science and earth science i just got started by reading causality by judea pearl
@djpremier333
@djpremier333 10 ай бұрын
Statistical rethinking introduces it nicely, the whole lecture is on yt
@HemantPoonia-wq8hr
@HemantPoonia-wq8hr 10 ай бұрын
@@djpremier333 thank you i will check their playlist
@weslleys.pereira6998
@weslleys.pereira6998 4 ай бұрын
Great video! Thanks for sharing. I have a question though. I am new to the subject, so I am having trouble to understand the last step in your derivation (29:00). I am speaking about the "very very easy thing to do". Would you be kind enough to point me to where I can find more information about that? Thanks!
@user-co6pu8zv3v
@user-co6pu8zv3v 10 ай бұрын
Thank you! :)
@subhamkundu5043
@subhamkundu5043 10 ай бұрын
Great summary. I have a question why the state is not dependent on the reward?
@brycerogers5050
@brycerogers5050 10 ай бұрын
Thanks Ritvik. Still having some trouble with why Reward (R) does not depend on Theta, mathematically - in your tree diagram, all the rewards are +-1/p, dependent (in their absolute quantity) only on the state, but also dependent (in their sign, which seems non-negligible in a reward system) on a Theta-based choice (H or L). Are you able to describe in a different way the intuition behind why d/dTheta logP(R,S|S,A) = 0?
@brycerogers5050
@brycerogers5050 10 ай бұрын
Maybe better said is: what's the nuance (or obvious principle) that allows consideration of an explicit variable in a derivative wrt that variable, but disallows consideration of an implicit variable (ie further back in the causal chain) in a derivative wrt to that variable? Thanks again, your channel rocks!
@ritvikmath
@ritvikmath 10 ай бұрын
Hey! Excellent question and it isn’t obvious in any sense. The key lies in the fact that this is a conditional probability rather than an unconditional one. If we removed the conditions on P(R,S | S,A) so it is just P(R,S) then this absolutely does depend on the policy theta and we can measure this dependency by tracing through the causal diagram. However by using a conditional probability we assume that the previous state and previous action are taken as given, at which point the probabilities for the next state and next reward are fixed and do not depend on the policy. Please let me know if that helps!
@brycerogers5050
@brycerogers5050 10 ай бұрын
@@ritvikmath Ah, that makes perfect sense. Thank you.
@adityabhatt3519
@adityabhatt3519 6 ай бұрын
Hi, I've trying to use multiple sources to look for the proof of this theorem. However, none of them use product rule (for derivative, time: 20.22). Can you please share with me if you know of a resource which does include the product rule.
@user-ed1ph7yj6o
@user-ed1ph7yj6o 5 ай бұрын
can you do rstan on R for Basyan stats case by case
BM25 : The Most Important Text Metric in Data Science
18:12
ritvikmath
Рет қаралды 8 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 392 М.
Little brothers couldn't stay calm when they noticed a bin lorry #shorts
00:32
Fabiosa Best Lifehacks
Рет қаралды 20 МЛН
Meet the one boy from the Ronaldo edit in India
00:30
Younes Zarou
Рет қаралды 19 МЛН
Whoa
01:00
Justin Flom
Рет қаралды 41 МЛН
КТО ЛЮБИТ ГРИБЫ?? #shorts
00:24
Паша Осадчий
Рет қаралды 3,7 МЛН
AI - Deep Reinforcement learning made easy again! - CrossQ
46:58
Machine Learning and AI Academy
Рет қаралды 523
The Key Equation Behind Probability
26:24
Artem Kirsanov
Рет қаралды 48 М.
Richard Sutton on Pursuing AGI Through Reinforcement Learning
55:32
Policy Gradient Methods | Reinforcement Learning Part 6
29:05
Mutual Information
Рет қаралды 28 М.
Solve any equation using gradient descent
9:05
Edgar Programmator
Рет қаралды 53 М.
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 55 М.
Greatest Math Theories Explained
9:18
ThoughtThrill
Рет қаралды 97 М.
Why Democracy Is Mathematically Impossible
23:34
Veritasium
Рет қаралды 1,7 МЛН
Little brothers couldn't stay calm when they noticed a bin lorry #shorts
00:32
Fabiosa Best Lifehacks
Рет қаралды 20 МЛН