KL Divergence - How to tell how different two distributions are

Рет қаралды 3,458

16 күн бұрын

Correction (10:26). The probabilities are wrong. The correct ones are here:
For Die 1: 0.4^4 * 0.2^2 * 0.1^1 * 0.1^1 * 0.2^2
For Die 2: 0.4^4 * 0.1^2 * 0.2^1 * 0.2^1 * 0.1^2
For Die 3: 0.1^4 * 0.2^2 * 0.4^1 * 0.2^1 * 0.1^2
Kullback Leibler (KL) divergence is a way to measure how far apart two distributions are.
In this video, we learn KL-divergence in a simple way, using a probability game with dice.
Shannon entropy and information gain: • Shannon Entropy and In...
Grokking Machine Learning book: www.manning.com/books/grokking-machine-learning
40% discount code: serranoyt

Пікірлер: 25

@melihozcan8676 10 күн бұрын

Thanks for the excellent explanation! I used to know the KL Divergence, but now I understand it!

@debashisghosh3133 13 күн бұрын

most intuitive video on KL Divergence...loved it.

@paedrufernando2351 14 күн бұрын

What a Morning surprise..lovely video

@camzbeats6993 6 күн бұрын

Very intuitive, thanks you. I like the exemple approach you take. 👏

@shouvikdey7078 13 күн бұрын

Love your videos, please make more such videos on mathematical description of generative models such as GAN, Diffusion, etc.

@SerranoAcademy 12 күн бұрын

Thank you! I got some on GANs and Diffusion models, check them out! GANs: kzfaq.info/get/bejne/brJhZMR-s5uviWw.html Stable diffusion: kzfaq.info/get/bejne/gNNxh9d4ld-lZXk.html

@bernardorinconceron6139 13 күн бұрын

Thank you Luis. I'm sure I'll use this very soon.

@johanaluna7385 14 күн бұрын

Wow!! Thank you!!! Finally I got it !

@sra-cu6fz 14 күн бұрын

Thanks for posting this.

@Omsip123 11 күн бұрын

So well explained

@shahnawazalam55 13 күн бұрын

That was intuitive as butter

@frankl1 13 күн бұрын

Great video. One question I have, why would I use KL instead of CE? are there situations in which one would be more suitable than the other ?

@SerranoAcademy 13 күн бұрын

That is a great question! KL(P,Q) is really the CE(P,Q), except you subtract the entropy H(P). The reason for this is that if you compare a distribution with itself, you want to get a zero. With CE, you don't get zero, so the CE of a distribution with itself could potentially be very high.

@bin4ry_d3struct0r 9 күн бұрын

Is there an industry standard for the KLD above which two distributions are considered significantly different (like how 0.05 is the standard for the p-value)?

@SerranoAcademy 9 күн бұрын

Ohhh that’s a good question. I don’t think so, since normally you use it for minimization or comparison between them, but I’ll keep an eye, maybe it would make sense to have a standard for it.

@mohammadarafah7757 13 күн бұрын

We expect to describe wasserstein distance 😊

@SerranoAcademy 12 күн бұрын

Ah good idea! I'll add it to the list, as well as earth-mover's distance. :)

@mohammadarafah7757 12 күн бұрын

@SerranoAcademy I also highly recommend to describe Explainable AI (XAI) which depends on statistics.

@__-de6he 13 күн бұрын

Thanks. That was good except so elementary things explanation like logarithm manipulations (every who is interested in your video has already known elementary math).

@mkamp 6 күн бұрын

I liked the mentioning of that, it makes the session more self obtained and it didn’t take much time anyway.

@Ashishkumar-id1nn 13 күн бұрын

why did you take average at 6:30 ?

@SerranoAcademy 13 күн бұрын

Great question! I took the average because the product is p_i^(nq^i), so the log is nq_i log(p_i), and I want to get rid of that n. It’s not super needed for the math, but I did it so that it gives exactly the KL divergence instead of n times it.

@Ashishkumar-id1nn 13 күн бұрын

@@SerranoAcademy thanks for the clarification

@aruntakhur 14 күн бұрын

The number shown in cells (2,1) and (3,1) in the table (time 10:32) to calculate the probabilities of the sequences are typo mistake. Pls correct it . kzfaq.info/get/bejne/qdCXjdumqNPDaIU.html

@SerranoAcademy 13 күн бұрын

Oh yikes you’re right, thank you! I can’t fix it but I’ll add a note