KL Divergence - How to tell how different two distributions are

  Рет қаралды 3,458

Serrano.Academy

Serrano.Academy

16 күн бұрын

Correction (10:26). The probabilities are wrong. The correct ones are here:
For Die 1: 0.4^4 * 0.2^2 * 0.1^1 * 0.1^1 * 0.2^2
For Die 2: 0.4^4 * 0.1^2 * 0.2^1 * 0.2^1 * 0.1^2
For Die 3: 0.1^4 * 0.2^2 * 0.4^1 * 0.2^1 * 0.1^2
Kullback Leibler (KL) divergence is a way to measure how far apart two distributions are.
In this video, we learn KL-divergence in a simple way, using a probability game with dice.
Shannon entropy and information gain: • Shannon Entropy and In...
Grokking Machine Learning book: www.manning.com/books/grokking-machine-learning
40% discount code: serranoyt

Пікірлер: 25
@melihozcan8676
@melihozcan8676 10 күн бұрын
Thanks for the excellent explanation! I used to know the KL Divergence, but now I understand it!
@debashisghosh3133
@debashisghosh3133 13 күн бұрын
most intuitive video on KL Divergence...loved it.
@paedrufernando2351
@paedrufernando2351 14 күн бұрын
What a Morning surprise..lovely video
@camzbeats6993
@camzbeats6993 6 күн бұрын
Very intuitive, thanks you. I like the exemple approach you take. 👏
@shouvikdey7078
@shouvikdey7078 13 күн бұрын
Love your videos, please make more such videos on mathematical description of generative models such as GAN, Diffusion, etc.
@SerranoAcademy
@SerranoAcademy 12 күн бұрын
Thank you! I got some on GANs and Diffusion models, check them out! GANs: kzfaq.info/get/bejne/brJhZMR-s5uviWw.html Stable diffusion: kzfaq.info/get/bejne/gNNxh9d4ld-lZXk.html
@bernardorinconceron6139
@bernardorinconceron6139 13 күн бұрын
Thank you Luis. I'm sure I'll use this very soon.
@johanaluna7385
@johanaluna7385 14 күн бұрын
Wow!! Thank you!!! Finally I got it !
@sra-cu6fz
@sra-cu6fz 14 күн бұрын
Thanks for posting this.
@Omsip123
@Omsip123 11 күн бұрын
So well explained
@shahnawazalam55
@shahnawazalam55 13 күн бұрын
That was intuitive as butter
@frankl1
@frankl1 13 күн бұрын
Great video. One question I have, why would I use KL instead of CE? are there situations in which one would be more suitable than the other ?
@SerranoAcademy
@SerranoAcademy 13 күн бұрын
That is a great question! KL(P,Q) is really the CE(P,Q), except you subtract the entropy H(P). The reason for this is that if you compare a distribution with itself, you want to get a zero. With CE, you don't get zero, so the CE of a distribution with itself could potentially be very high.
@bin4ry_d3struct0r
@bin4ry_d3struct0r 9 күн бұрын
Is there an industry standard for the KLD above which two distributions are considered significantly different (like how 0.05 is the standard for the p-value)?
@SerranoAcademy
@SerranoAcademy 9 күн бұрын
Ohhh that’s a good question. I don’t think so, since normally you use it for minimization or comparison between them, but I’ll keep an eye, maybe it would make sense to have a standard for it.
@mohammadarafah7757
@mohammadarafah7757 13 күн бұрын
We expect to describe wasserstein distance 😊
@SerranoAcademy
@SerranoAcademy 12 күн бұрын
Ah good idea! I'll add it to the list, as well as earth-mover's distance. :)
@mohammadarafah7757
@mohammadarafah7757 12 күн бұрын
@SerranoAcademy I also highly recommend to describe Explainable AI (XAI) which depends on statistics.
@__-de6he
@__-de6he 13 күн бұрын
Thanks. That was good except so elementary things explanation like logarithm manipulations (every who is interested in your video has already known elementary math).
@mkamp
@mkamp 6 күн бұрын
I liked the mentioning of that, it makes the session more self obtained and it didn’t take much time anyway.
@Ashishkumar-id1nn
@Ashishkumar-id1nn 13 күн бұрын
why did you take average at 6:30 ?
@SerranoAcademy
@SerranoAcademy 13 күн бұрын
Great question! I took the average because the product is p_i^(nq^i), so the log is nq_i log(p_i), and I want to get rid of that n. It’s not super needed for the math, but I did it so that it gives exactly the KL divergence instead of n times it.
@Ashishkumar-id1nn
@Ashishkumar-id1nn 13 күн бұрын
@@SerranoAcademy thanks for the clarification
@aruntakhur
@aruntakhur 14 күн бұрын
The number shown in cells (2,1) and (3,1) in the table (time 10:32) to calculate the probabilities of the sequences are typo mistake. Pls correct it . kzfaq.info/get/bejne/qdCXjdumqNPDaIU.html
@SerranoAcademy
@SerranoAcademy 13 күн бұрын
Oh yikes you’re right, thank you! I can’t fix it but I’ll add a note
The KL Divergence : Data Science Basics
18:14
ritvikmath
Рет қаралды 41 М.
The better way to do statistics
17:25
Very Normal
Рет қаралды 177 М.
FOOLED THE GUARD🤢
00:54
INO
Рет қаралды 63 МЛН
DO YOU HAVE FRIENDS LIKE THIS?
00:17
dednahype
Рет қаралды 49 МЛН
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 83 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 301 М.
Banach Fixed-Point Theorem
18:04
The Bright Side of Mathematics
Рет қаралды 8 М.
A Short Introduction to Entropy, Cross-Entropy and KL-Divergence
10:41
Aurélien Géron
Рет қаралды 343 М.
Physicists Claim They Can Send Particles Into the Past
7:21
Sabine Hossenfelder
Рет қаралды 213 М.
KL Divergence - CLEARLY EXPLAINED!
11:35
Kapil Sachdeva
Рет қаралды 26 М.
Man tries outrunning cops on skateboard
0:10
Frankie Lapenna
Рет қаралды 13 МЛН
Достали существо из под земли
0:29
RICARDO
Рет қаралды 1,6 МЛН
БОЛЬШОЙ ПЕТУШОК #shorts
0:21
Паша Осадчий
Рет қаралды 8 МЛН