Variance: Why n-1? Intuitive explanation of concept and proof (Bessel‘s correction)

  Рет қаралды 14,777

statsandscience

statsandscience

Күн бұрын

You might have learned that in some instances you don't divide by n when you calculate the empirical variance of your data but by n-1. This is known as Bessel's correction. In this video, I will explain when to use it, why we need a correction in the first place, and why this correction happens to be this weird n-1 thing you all know and wonder about.
0:00 - Intro
1:33 - What is variance?
4:28 - When to use the correction and what happens if you don't
9:20 - Explanation of bias involving sample mean
17:03 - Proof why n-1 eliminates bias involving sample mean
21:56 - Explanation and proof using pairwise differences
28:42 - Summary
If you want to read more about the topic:
en.wikipedia.org/wiki/Bessel'...
aaronjfisher.github.io/why-di...
stats.stackexchange.com/quest...
stats.stackexchange.com/quest...
www.quora.com/If-we-used-the-...

Пікірлер: 71
@J-sh4gf
@J-sh4gf 5 ай бұрын
“The expected difference between the “correct” formula of the variance and the “wrong” one with n and the sample mean, is equal to the variance of the sample mean!” This one sentence untied a knot! Thank you very much. This video is by far, and I watched various explanation of degrees of freedom etc., the best I've seen.
@phil2888
@phil2888 7 ай бұрын
This is great, I have grappled with this for quite a while.
@cannot-handle-handles
@cannot-handle-handles 2 жыл бұрын
The explanation given at 24:55 why we divide by 2 ("each value is in here twice") does not seem intuitive: If we only added (x_i-x_j)^2 for i
@statsandscience
@statsandscience 2 жыл бұрын
Not sure if I understand you correctly, but your suggestion is basically to only look at one of the triangles of the matrix without the diagonal. Is that correct? In that case you would have to divide by n(n-1) only and not by 2 to get the same value.
@cannot-handle-handles
@cannot-handle-handles 2 жыл бұрын
@@statsandscience I'll try and elaborate more: The sum of all the squares is 812. Divided by n^2 (because that's the number of squares we're considering), that's 16.571… Divided by 2, that's 8.28571… And finally, with Bessel's correction, it's 9.666… The sum of half of the squares is 406. Divided by n(n-1)/2 (because that's the number of squares we're considering), that's 19.333… So we still have to divide by 2 to get 9.666… I'm not saying the formula is wrong, just the explanation ("each value is in here twice"). In both cases, you have to divide by the number of squares AND by 2. So the division by 2 is not explained by counting the squares twice.
@cannot-handle-handles
@cannot-handle-handles 2 жыл бұрын
@statsandscience But the number of squares in one of the triangles is 1+2+3+4+5+6+7=21, not 42.
@statsandscience
@statsandscience 2 жыл бұрын
@@cannot-handle-handles Yes, I deleted my comment because I noticed my mistake prior to your answer. It makes sense what you say, I did not do this test before. Do you have a good intuitive explanation for the 2?
@statsandscience
@statsandscience 2 жыл бұрын
Is it because you basically calculate the means of all pairs of points?
@DanTee2_718
@DanTee2_718 Жыл бұрын
This was honestly such a great watch, thank you for the video
@statsandscience
@statsandscience Жыл бұрын
Thank you!
@user-kh5ju1du4d
@user-kh5ju1du4d 9 ай бұрын
Thank you SO much for this video. It has been so hard to find a proper explanation of this.
@danielheckel2755
@danielheckel2755 2 жыл бұрын
Very enjoyable explanation. Thank you! Greetings from Mexico.
@shpensive
@shpensive 2 жыл бұрын
Fantastic, I've been wondering this for a long time..
@MathAndComputers
@MathAndComputers 2 жыл бұрын
Thanks for the explanations! I'd been meaning to learn about this for ages, but just hadn't gotten around to it, haha. 😅 Something that might be helpful is that if you put times and labels in a list in the description, KZfaq will now automatically split up the play bar into chapters, as long as the first one is 0:00, so something like: 0:00 - Intro 1:33 - Terminology 4:28 - Estimating the mean or variance 9:20 - Why is the version with n biased? 17:03 - Why does n-1 save it? (explanation 1) 21:56 - Why does n-1 save it? (explanation 2) 28:42 - Summary
@statsandscience
@statsandscience 2 жыл бұрын
That was super helpful, thanks! And extra thanks for providing all the correct time stamps!
@gokulkrishna2667
@gokulkrishna2667 2 жыл бұрын
The greatest video on this aspect on the internet!
@statsandscience
@statsandscience 2 жыл бұрын
Thanks, glad you liked it!
@mariuskornovan5520
@mariuskornovan5520 2 ай бұрын
Great video! Helped me finally understand the derivation of the sample standard deviation
@hugoharada5301
@hugoharada5301 2 жыл бұрын
Loved the video. Thanks!!
@Number_Cruncher
@Number_Cruncher 2 жыл бұрын
Nice, now it is clear to me.
@statsandscience
@statsandscience 2 жыл бұрын
Great, thank you for the comment!
@chris_7711
@chris_7711 6 ай бұрын
Herzlichen Dank! Sehr aufschlussreich!
@Sid-ge9vb
@Sid-ge9vb Жыл бұрын
this is an amazing explanation, Thank you so much ! I was so frustrated by the hand wavy explanations on youtube , even in lectures !
@statsandscience
@statsandscience Жыл бұрын
Thank you, I really appreciate it!
@milanradovanovic3693
@milanradovanovic3693 Жыл бұрын
This puzzeled me for a long time... Thanks for explanaition... P. S. Always thought it was spelling mistake in book(s)
@ajaydalvi1378
@ajaydalvi1378 2 жыл бұрын
Finally Understood !...
@koramawin6134
@koramawin6134 2 жыл бұрын
Subscribed!
@pramodabandaru3566
@pramodabandaru3566 4 ай бұрын
I did not get how (x(sample mean)-population mean) squared/varience of sample mean is equal to variance of population/n or sample size. Cd anyone pls explain? 20:00
@KarthikNaga329
@KarthikNaga329 2 жыл бұрын
What software do you use to type math equations and animate them in videos? thanks!
@statsandscience
@statsandscience 2 жыл бұрын
It's honestly just powerpoint and I won't recommend it for standard use, I am sure there are better options out there...
@jkally123
@jkally123 2 жыл бұрын
How did he get to the statement made on 20:00 - that var(sample mean) is equal to population variance divided by n?
@statsandscience
@statsandscience 2 жыл бұрын
I brushed over this a bit because it was not the focus here. Intuitively, it makes sense I think that the variance of the sample mean must be smaller than the population variance and that this depends on n because as I explained, there is no way to get the most extreme observed values as means, and the mean will always become "less extreme" in comparison the higher n is. However, I don't know an intuitive explanation for the exact formula, but the reasoning goes like this: You try to calculate the variance of the sample mean, that is, the sum of the observations divided by n, like so: Var(obs1+obs2+obs3.../n). You can rewrite this to Var((1/n)*obs1 + (1/n)*obs2 + (1/n)*obs3...). A linear combination like this has a variance equal to the sum of whatever the factor is squared (in this case 1/n^2) times the variance of the individual components: (1/n^2)*Var(obs1) + (1/n^2)*Var(obs2)... When you then assume identical variances for the observations, this equals (1/n^2*)n*Var(obs) which is Var(obs)/n. You can find that a bit nicer formatted also here: online.stat.psu.edu/stat414/lesson/24/24.4 Hope this helps, thank you for the comment!
@Titurel
@Titurel 6 ай бұрын
I was wondering too!
@ckq
@ckq Жыл бұрын
Before I watch, basically the average cuts the variance by a factor of n, but when we find the difference between a sample value and the sample average, the average contains 1/nth of that sample so the calculated variance is shrunk by a factor of (1-1/n).
@statsandscience
@statsandscience Жыл бұрын
I am not sure if I understand correctly but I think there might be something to it framing the mean as containing 1/n parts of the information within the sample... Would you say you were right after watching?
@osaabd390
@osaabd390 Жыл бұрын
Thank you so much for this great video. I appreciate it. I have to give you feedback though on the quality of the sound. I found it sometimes difficult to hear well what you say. Two things I would suggest you do, as I think your understanding of these concepts and ability to communicate them visually need not go to waste. The two solutions I suggest are 1. a better microphone (Shure and Rode are the best and not that expensive) and 2. read slower pleeaassee. I had to stop multiple times and go back to understand fully what you say. If you think you need to keep your videos below a certain time threshold, then cut off unnecessary words from your script, using shorter words, trim wordy phrases (e.g. use 'most' instead of 'the majority of'). Thanks again for the great effort, keep it going.
@statsandscience
@statsandscience Жыл бұрын
Hey, thank you so much to take the time to give detailed feedback. 1) I am actually using such a microphone, but maybe it wasn't well positioned? I will check that. 2) thanks, I will try! It is not that I want to shorten videos, I am just used to talk fast I guess...
@osaabd390
@osaabd390 Жыл бұрын
​@@statsandscience good luck with your work and thank you from the bottom of my heart, I really do understand why we divide by n-1 now :D .
@lurkertech
@lurkertech 9 ай бұрын
Thanks for the best video I've ever seen on the n-1. Referring to the key questions at 8:37 I was hoping to find the answer to a more specific question #2: not just "why isn't it n-2 or n-pi" but "why does the correction factor (n-1/n) not depend on the ratio between the sample size and the population size?" That is, if I know the population is 1000, and I choose a sample of 10 vs. a sample of 999, why wouldn't I use different correction factors to get the best answer? After all, my sample of 999 is going to be darn close to the true population variance whereas my sample of 10 is going to be way off. Your video kind of implies, but doesn't say directly (wish it did) that the n-1 "solution" provides the "average" correction factor you might need for any possible sample size relative to the population size, or to say the same thing in another way, the n-1 is the best you can do if you don't actually know the population size. Is that correct? If we DO know the population size exactly, then can we choose a better correction factor that is tailored to that particular sample size : population size ratio?
@lurkertech
@lurkertech 9 ай бұрын
To make an even clearer statement of the problem...suppose my sample size is always 499. Now suppose that the actual population is either 500 or 1000. So that's 2 cases in total. According to the n-1 rule, I should apply the same correction (499/(499-1)) to 499 samples in a 500 population as I should apply for 499 samples in a 1000 population. That doesn't seem to be the best we can do if we know the actual population size, since I should not need to correct as hard when sample size is very close to population size. So is the n-1 rule designed only for the case where one does not know the population size? If we do know the population size, can we do better? Using what formula?
@statsandscience
@statsandscience 8 ай бұрын
Sorry that I did not come around earlier to answer this question. You put a lot of effort into this and I hope you still benefit from an answer! When you take intro stats classes, a quite basic assumption that lurks basically everywhere is that the population you are dealing with is infinite. Of course, this assumption is also basically always wrong. Usually that does not matter though, as populations are usually "big enough", so that wrong estimates of the actual population size do not influence our outcomes to a degree we would care about. The same is true here in this formula: It is not the "average" correction factor for all possible samples, but the one for an infinite population - again, it usually does not matter what the actual size is, except when the sample size comes close to the population size. Now you correctly identified that this can cause problems because in this case you actually know a lot more than what the formula is giving you credit for. What people came up with for this case it the Finite Population Correction (FPC) - I would advise to just google it and look for yourself as the space here is quite limited (of course you can also ask follow-up questions about that here if you like!). However, in a nutshell this correction does what you pointed out - it prevents that you correct "to hard".
@lurkertech
@lurkertech 8 ай бұрын
@@statsandscience Thank you, it is a very useful answer. I didn't know about that assumption, and so when your examples had a population size of 7, I was extra confused. Thanks for clearing it up. That makes it clearer why the correction should be greater when the sample size is smaller. Maybe mention that assumption in your video description to help others in the same boat as me?
@sayarsine6479
@sayarsine6479 Жыл бұрын
legendary
@statsandscience
@statsandscience Жыл бұрын
Thanks!
@user-ws5sq8fm4k
@user-ws5sq8fm4k Жыл бұрын
Does the explanation using pairwise differences apply in sampling without replacement where diagonal zeros don't occur?
@statsandscience
@statsandscience Жыл бұрын
They would still occur, wouldn't they? Because the margins of the table are identical either way, so there would be zeros on the diagonal. Sampling without replacement is also a separate issue, as for instance discussed here: stats.stackexchange.com/questions/70124/unbiased-estimator-of-variance-for-samples-without-replacement
@user-ws5sq8fm4k
@user-ws5sq8fm4k Жыл бұрын
@@statsandscience Thank you for your reply. Zeros occur when we substract each data point from itself and this doesn't happen in case of sampling without replacement.
@andrew.schaeffer4032
@andrew.schaeffer4032 Жыл бұрын
What kind of statistics exactly do I need to learn in order to follow along? This looks really interesting, but I don't fully understand how it all works. Thanks!
@statsandscience
@statsandscience Жыл бұрын
You will probably find the general concept in any applied statistics textbook. As I said it is a basic step from descriptive statistics where you only draw conclusions about a particular sample to inferential statistics where you use a sample to draw conclusions about a bigger population and that is basically what is always needed and taught in applied statistics. The issue is that those books tend to be shallow in that regard and other books with more detail might only be helpful with a serious understanding of the math behind it. Which is why I made the video to bridge between these two. Let me know if that was what you had in mind!
@iwatchtvwithportal5367
@iwatchtvwithportal5367 9 ай бұрын
I always thought the n-1 was related to degree of freedom spent, but actually it isn't!
@statsandscience
@statsandscience 8 ай бұрын
Well, it is, but you can sort of getting around that in an explanation like this one. If you are interested, feel free to watch my video on degrees of freedom. :)
@faresmhaya
@faresmhaya 5 ай бұрын
The explanation for why we devide by 2n² in the second formula is not intuitive to me, despite it working on a small example I tested. I feel redundency in dividing by both 2 and n². If we have two instances of each distance measurement, okay we can divide by two, reducing the number of distances we're taking into consideration. But why would we then need to also divide by a second n if we reduced the number of distances we're taking into consideration from n² when we divided by 2?
@user-ws5sq8fm4k
@user-ws5sq8fm4k Жыл бұрын
Thank you for this great video. I hope you continue uploading more videos. Do you have e a written text for this video? As a non-native English speaker, I face some difficulties to follow your speaking. I need to repeat hearing of many parts of the video to catch the words.
@statsandscience
@statsandscience Жыл бұрын
Thank you! Yes, I do have that and I always wanted to make proper subtitles but just did not get to it yet. KZfaq auto generates subtitles as you probably know but I don't really like them. I will try to look into that soon and let you know.
@user-ws5sq8fm4k
@user-ws5sq8fm4k Жыл бұрын
@@statsandscience Thank you for your reply. I will wait for this precious script.
@statsandscience
@statsandscience Жыл бұрын
@@user-ws5sq8fm4k English subtitles are up now! I hope you will find them helpful
@se0271
@se0271 Жыл бұрын
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
@statsandscience
@statsandscience Жыл бұрын
The main problem is that you don't know that. Remember that we do all this with samples because we do not have access to the population - and this is a problem that happens because of sampling, but not when you can use the population values. Imagine a student who goes to the school cafeteria every day, and who knows that the staff tends to hand out portions that are too small most days. So they ask for something extra every day (and receive it). This will move the portion size to the optimum most days, but on days where the portion size was correct in the first place or even greater, the request will make it worse. However, this is still better because on most days the size is too small, so the average will be closer to the optimum. Does that help?
@se0271
@se0271 Жыл бұрын
@@statsandscience Thank you for your response. It definitely helps but I still have the question of how you would know that the data values from a sample are too small. You cannot infer that it's too large, but why can you infer that it's too small? Shouldn't it go both ways? Maybe naturally, samples tend to gravitate around smaller data values as with the portion size example you gave? If that's the case then it does actually make sense since you'd typically not want to exceed the normal portion size so you don't run out (and this idea of scarcity can be applied to any other examples).
@statsandscience
@statsandscience Жыл бұрын
@@se0271 you indeed don't know that for a particular value. It can be too big or too small. It is just more likely that it is smaller. I'm afraid that when I go into more details I would just repeat what I said in the video but when you have specific questions I would be happy to help!
@se0271
@se0271 Жыл бұрын
@@statsandscience I see, I appreciate the explanation- thank you!
@michaelchareka1175
@michaelchareka1175 Жыл бұрын
Please upload more videos. I’m begging
@statsandscience
@statsandscience Жыл бұрын
Thanks, glad you liked it!
@Hossein_am98
@Hossein_am98 Жыл бұрын
thanks for the video, really good way to explaine! Frankly to me, it seemed you are reading from a written text, because your speaking was too constant(no stress on the words no up and downs no nothing) and that made it really difficult for me to understand what you're saying
@statsandscience
@statsandscience Жыл бұрын
Thanks! I will try to improve speaking next time!
@SanatanYogii
@SanatanYogii 2 жыл бұрын
upload more videos
@funfair-bs7wf
@funfair-bs7wf Жыл бұрын
Great video, but would be even greater if you articulated a bit more 😉
@user-ws5sq8fm4k
@user-ws5sq8fm4k Жыл бұрын
If you permit me, I may put Arabic translation on your video. If you provide me by the English script, it will facilitate my work.
@statsandscience
@statsandscience Жыл бұрын
Yes, that sounds great! I think you should now be able to just download it after I have added the subtitles.
Statistical degrees of freedom  - What are they REALLY?
20:59
statsandscience
Рет қаралды 2,2 М.
Alex hid in the closet #shorts
00:14
Mihdens
Рет қаралды 18 МЛН
ЧУТЬ НЕ УТОНУЛ #shorts
00:27
Паша Осадчий
Рет қаралды 10 МЛН
The most important skill in statistics
13:35
Very Normal
Рет қаралды 313 М.
n vs n-1.  Why are there 2 formulas for the standard deviation?
12:03
Why Dividing By N Underestimates the Variance
17:15
StatQuest with Josh Starmer
Рет қаралды 125 М.
Dividing By n-1 Explained
14:18
PsychExamReview
Рет қаралды 4,2 М.
How We’re Fooled By Statistics
7:38
Veritasium
Рет қаралды 3,6 МЛН
What are degrees of freedom?!? Seriously.
27:17
zedstatistics
Рет қаралды 194 М.
Variance: Why we use the squared deviation instead of absolute deviation
13:45
The Strange Case of the Umbral Calculus
24:26
mathematimpa
Рет қаралды 38 М.
Alex hid in the closet #shorts
00:14
Mihdens
Рет қаралды 18 МЛН