Jensen's Inequality : Data Science Basics

Рет қаралды 11,181

ritvikmath

Күн бұрын

a surprisingly super useful result for data science!
0:00 Convex Functions
3:54 Jensen's Inequality
8:40 Application

Пікірлер: 32

@robharwood3538 Жыл бұрын

Couple of nitpicks: 1) Technically, the type of the combination of x1 and x2 is not _just_ a 'linear' combination, it is specifically a 'convex' combination. A convex combination is a linear combination where a) the coefficients of the combination add up to 1, and b) all the coefficients are non-negative (i.e. >= 0). A general 'linear' combination does not have these restrictions in general, so it is helpful to use the more specific term, 'convex' combination, to reduce potential misunderstanding/misuse of the theorm/inequality. [* See Note below.] 2) Although it is valid and correct to write the convex combination as you have, as t * x1 + (1-t) * x2, it is more common/typical to write it with the (1-t) part first, as (1-t) * x1 + t * x2. This way, it is very easy to see that when t=0, the value is x1, and when t=1, the value is x2. You can then think of t as like a 'slider' parameter, with t=0 representing the 'start', and t=1 representing the 'end', and sliding t between 0 and 1 gives a nice linear slide on the x-axis between x1 and x2. Even better, name them x0 and x1, and it makes the connection even more clear. [* Note: The in-between case, when you have a linear combination where all the coefficients add to 1, but they are *not* restricted to be non-negative, is called an Affine Combination. For example, in this case, the two coefficients, 1-t and t add up to: (1-t) + t = 1 + (t - t) = 1 + 0 = 1 In an affine combination, t would be allowed to be negative for example, t = -3, then (1-t) would be 1-(-3) = 4, but the sum would still be: -3 + 4 = 1. Affine combinations are useful for example for writing a line equation in terms of two given points on the line, among many other uses. So: Convex Combo ⊆ Affine Combo ⊆ Linear Combo Meanings / Constraints: Linear: The coeffs are required to be only scalars (or scalar variables, such as t in this case). Affine: It's Linear *and* the coeffs all add up to exactly 1. Convex: It's Affine *and* the coeffs are all non-negative, >= 0. Took me a while to wrap my head around these three types of combinations when I first ran across them, so I thought it might help some folks to have them all spelled out.]

@ritvikmath Жыл бұрын

huge thanks for this comment! I really appreciate the nuanced explanation and comments on things that could have been improved.

@Aishiliciousss 2 ай бұрын

This was helpful, thank you.

@eyuelmelese944 6 ай бұрын

This channel is so underrated! I just graduated from an msc in DS, and still coming back here for concepts. Thanks!

@matthewkumar7756 Жыл бұрын

Incredibly accessible and insightful explanation. Keep the videos coming!

@ritvikmath Жыл бұрын

Thanks, will do!

@husseinjafarinia224 Жыл бұрын

I stunned how wonderfully you explained :D

@ritvikmath Жыл бұрын

Thanks!

@prashlovessamosa Жыл бұрын

Thanks for providing awesome knowledge is such a easy way.

@ritvikmath Жыл бұрын

Of course!

@sftekin4040 Жыл бұрын

That was really cool! Thank you!

@sergioLombrico 9 ай бұрын

Such a good explanation!

@Trubripes 3 ай бұрын

I proved it by defining delta as the change in KL resulting from shifting mass m from x1 to x2, then taking the second order derivative of that function. This is way better through.

@user-wr4yl7tx3w Жыл бұрын

Cool. Really helpful.

@fran9426 Жыл бұрын

Another great video! Had never heard of KL or Jensen’s inequality. Would you say the latter is predominately useful for understanding proofs or do you ever use it as a Data Scientist in the course of building models? Watching the video I thought maybe it could be the case that Jensen’s inequality is useful in estimating wether or not the target (the model output) is convex; if it’s not convex it would tell us that we can’t assume a simple minimizer will work on this problem since it might get stuck in a local minima.

@davidmurphy563 Жыл бұрын

6:00 Ah, in game dev we call this "lerping", short for linear interpolation. 0 = point A, 1 = point B and 0.5 is exactly in between. You use it soooo much. There are other sorts too, cubic is a nice one but it's not a straight line obvs. Anyway, great explanation so far. Subbed!

@ritvikmath Жыл бұрын

Thanks! Love hearing about applications in other fields

@tmwtpbrent14 Жыл бұрын

Jensen's is my favorite inequality.

@user-wr4yl7tx3w Жыл бұрын

Yes, I remember the chopstick helper function part of the proof. :)

@Set_Get Жыл бұрын

thank you

@ritvikmath Жыл бұрын

Welcome!

@mohammadrezanargesi2439 Жыл бұрын

Gentleman, Can you explain how log of ratio of two pdf got to be convex? In theory the convex function of another non-decreasing function is convex, if I'm not mistaken. So there need to be a proof that the ratio of the pdfs have got a positive derivatives. Am I right?

@bilalarbouch5849 Жыл бұрын

Great video , would you mind making a video about james-stein paradox , thanks 🙏

@ritvikmath Жыл бұрын

Great suggestion!

@ParijatPushpam Жыл бұрын

Can we say that constant function is a convex function? [Because it always follows jensens inequality ..(=)]

@counting1234 11 ай бұрын

Thanks for the great videos! Do you think you could share references? I'd like to share some of your videos with my team and any published references would be very helpful.