Mastering Hypothesis Testing for Data Science Interviews: Binomial, Z-test, and T-test

  Рет қаралды 55,506

Emma Ding

Emma Ding

Күн бұрын

This video is part 1 of hypothesis testing problems in data science interviews.
Part 2 of hypothesis testing problems in data science interviews:
• A/B Testing Analysis M...
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Intro
00:34 Three types of questions
02:18 When to use binomial test, z-test and t-test
05:09 t-distribution vs z-distribution
06:50 Testing proportions
09:26 What's in the next video

Пікірлер: 76
@stella123www
@stella123www 3 жыл бұрын
best hypothesis testing video I've ever seen on youtube, thank you for producing great content!
@liuauto
@liuauto 3 жыл бұрын
This video will save tons of effort before taking a stats course and diving into any details. Have not seen such a helpful diagram ever before.
@deepadas4585
@deepadas4585 3 жыл бұрын
Love your videos, Emma! Very insightful and to-the-point explanation. I would love to see some domain-specific analytics interview case studies like- supply chain analytics, e-commerce analytics.
@cming6108
@cming6108 3 жыл бұрын
so much appreciation for every content you upload!!!
@jeoffleonora4612
@jeoffleonora4612 3 жыл бұрын
Great video as always! Thanks Emma!
@CodeEmporium
@CodeEmporium 3 жыл бұрын
This is good detail. Love it
@oliviazhang2922
@oliviazhang2922 2 жыл бұрын
You are absoluetly the best Emma!! Thank you!!!
@vincenttan6303
@vincenttan6303 3 жыл бұрын
good stuffs! clearer than textbook and even lecturers.
@user-nz5oi8pd5m
@user-nz5oi8pd5m 2 жыл бұрын
always spoiled by Emma's concise and clear explaination.
@j33vn
@j33vn 2 жыл бұрын
Great content as always Emma. An intuitive way to think about not using t-test for estimating population proportion is that for Bernoulli data, there is only one unknown. The population proportion. Once we know it, the variance is simply p(1-p). But In the case of estimating population mean, there are two unknowns. Population mean and population standard deviation. The heavier tail of t dbn is used to capture the extra uncertainty caused by this additional unknown. Khan Academy explains this in more detail for anyone interested. Thanks!
@emma_ding
@emma_ding 2 жыл бұрын
Great observation Jeevan! Thank you for sharing!
@user-bn6tc4vv6l
@user-bn6tc4vv6l 11 ай бұрын
Hi, which video/modules from Khan Academy explain this? thanks
@brothermalcolm
@brothermalcolm 3 жыл бұрын
Perfect, just what I need, subscribed!
@norilouis
@norilouis Жыл бұрын
This is SO helpful and I really appreciate your content Emma!
@emma_ding
@emma_ding Жыл бұрын
I'm so glad to hear you found it helpful, Louis! Thanks so much for watching. 😊
@hameddadgour
@hameddadgour 2 жыл бұрын
Great explanation and very informative! Thank you!
@starbuststream3219
@starbuststream3219 Жыл бұрын
Very informative video for job interview preparers!
@anamikadas9445
@anamikadas9445 3 жыл бұрын
Love your videos Emma! For bernoulli variables, would a Chi-Squared also work? Is one method preferred over another in practice?
@sirvachjumani7215
@sirvachjumani7215 3 жыл бұрын
Really useful content for interviewers.
@cliffrunner
@cliffrunner Жыл бұрын
this is a great video! thanks a lot!
@dallalstreet1775
@dallalstreet1775 3 жыл бұрын
thanks Emma! woderful video
@lydiamai6861
@lydiamai6861 3 жыл бұрын
Hi Emma, although I have not learnt this far, I enjoyed the video thanks to your clear and structured explanation. Thanks.
@emma_ding
@emma_ding 3 жыл бұрын
Happy to hear that! Thank, Lydia!
@chuchuzhu333
@chuchuzhu333 3 жыл бұрын
Thank you so much!
@Leon71
@Leon71 3 жыл бұрын
Thank you very much!
@datasciencepreparationhub9933
@datasciencepreparationhub9933 2 жыл бұрын
Good explanation!
@kangxinwang3886
@kangxinwang3886 3 жыл бұрын
this is just good period
@sinhamohit
@sinhamohit 2 жыл бұрын
Timestamps Top quality content No funky intro music No repetitive sentences No begging for likes and subscribe Actually gets started when says "Let's get started" Earning subscribers the right way!
@emma_ding
@emma_ding 2 жыл бұрын
Thanks Mohit for the summary! :)
@nattapatjuthaprachakul9859
@nattapatjuthaprachakul9859 3 жыл бұрын
Thank you so much
@csousa3608
@csousa3608 Жыл бұрын
Great video! I would love to see a video about hypothesis testing but applied to a case of use when you have to apply A/B/n testing.
@emma_ding
@emma_ding Жыл бұрын
Great suggestion! In fact, I have a video on the topic you suggested kzfaq.info/get/bejne/bNunY6RkxrHbfZc.html, hope it helps! :)
@hiapple6060
@hiapple6060 2 жыл бұрын
Hi Emma, what test should I use if the metric follows a Bernoulli distribution, and with very different sample sizes in each group, say, 10000 observations in control and 1000 in treatment? In this case, should I use z-test with the pooled standard error or Welch's t-test?
@racoonYY109
@racoonYY109 3 жыл бұрын
Hi Emma, may I understand what's the difference between z-test and binomial test, if to compare CTR of two groups?
@tekingunasar4189
@tekingunasar4189 2 жыл бұрын
Hi! Great video. I am a little bit confused on the flow chart though, because it references the knowing some information about the population distribution, particularly when in the flow chart we check whether or not the population distribution is normal. I am confused by this because if we were to know that the population distribution is normal, wouldn't that make hypothesis testing redundant? I know that this is actually not the case, and that I am misunderstanding something, but I'm not sure what exactly that is.
@rioache1081
@rioache1081 3 жыл бұрын
4:11 There is a lot of arguing on the stats forums about the assumption of normality for t-test. And many of the comments state that for t-statistic to have a t-distribution the population has to follow the normal distribution (so t-test does actually require normality of population). What's your opinion on that topic?
@zhihaoxu756
@zhihaoxu756 2 жыл бұрын
Hi Emma, thank you very much for making this videos. It is indeed very helpful! However, I have a question regarding the difference between Z-test and Binomial test. For small sample, i.e when np
@xiaofeichen5530
@xiaofeichen5530 Жыл бұрын
I think she means calculating directly the probability of k successes in n trials using the binomial pmf Pr(X=k)=(n choose k)p^k(1-p)^(n-k)
@ramanadeepsingh
@ramanadeepsingh 27 күн бұрын
Great video...what happens when sample-size is less than 30 and population distribution is not normal. What kind of tests are used in practice?
@bcws
@bcws 7 ай бұрын
Does the Slutsky theorem apply here? Slutsky theorem only applies when one number converges in distribution to a random element and the other converges in probability to a constant.
@shrutigupta5104
@shrutigupta5104 2 жыл бұрын
Hi Emma, thanks for making informative videos. My question is how did you choose sample size of 30 as the marker for differentiating between small sample size to large sample size?.
@Fawk3s1
@Fawk3s1 2 жыл бұрын
it is a convention in statistics. Basically, if n > 30 you can apply the central limit theorem, which says that your distribution is normally distributed if n > 30.
@akshat175
@akshat175 3 жыл бұрын
Hey Emma, your videos are super useful and simple to follow. Is there a place I can access your slides as well for quick review of the key concepts? This comment would hold for all your videos and not just this one..
@emma_ding
@emma_ding 3 жыл бұрын
Sorry, there's no slides, it's all part of the video editing. But I'll definitely consider providing it in the future if it helps!
@racoonYY109
@racoonYY109 3 жыл бұрын
Also why for t-test, we have pooled and unpooled variances scenarios, while for z-test for two proportions we always used pooled?
@cql8878
@cql8878 2 жыл бұрын
I love your videos Emma! But by far this one is the hardest one to follow among yours :(
@emma_ding
@emma_ding 2 жыл бұрын
Thanks for the feedback! Could you be specific which part is hard to follow? Thanks!
@plttji2615
@plttji2615 2 жыл бұрын
I m quite confused that when testing the conversation rate should I use z test. Cuz some websites mentioned t test. Could you please explain this?
@navishagarwal1736
@navishagarwal1736 3 жыл бұрын
Hey Emma! Thanks for another great video. I have watched the video a few times now but the part on "testing proportions" seems to be going over my head. Possibly because I do not have some basics necessary here. Any suggestions on recommended reads?
@emma_ding
@emma_ding 3 жыл бұрын
For resources about stats, you can find some resources from my blog post towardsdatascience.com/how-i-got-4-data-science-offers-and-doubled-my-income-2-months-after-being-laid-off-b3b6d2de6938. For A/B testing specific, this book is a good read. www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical/dp/1108724264
@appledotted
@appledotted 3 жыл бұрын
I had a tech screen with a fin-tech company today. They asked me to walk through the math behind testing normality with skewness. (Quite odd) I got a bit stuck on how to convert the skewness into a p-value. I mentioned that normally we have CLT that we can do normal approximation like for Binomial and Poisson Distribution, but I am not sure about skewness. Then I said maybe we can try bootstrapping to simulate the sampling distribution to get the variance of skewness if the distribution is unknown. (Not sure if this is a correct approach) I tried to find online resources about this after the interview, but somehow none of them go in-depth to talk about this part. Do you happen to have some insight? P.S. Really like your videos, very concise and instructive. :)
@appledotted
@appledotted 3 жыл бұрын
Just rethought about this, I think we can simulate a normal distribution over and over again with the same n, and see what is the proportion of those the skewness is more extreme than our observed data, and use that proportion as the p-value.
@YK-mh3mp
@YK-mh3mp 2 жыл бұрын
For general distribution other than normal distribution, I think it is theoretically wrong to use t-test. It is not only for proportions.
@ishpandey7886
@ishpandey7886 3 жыл бұрын
Thanks a ton.... I never found such videos... You are really helping the community... I just have a question if the size
@emma_ding
@emma_ding 3 жыл бұрын
Yes, it just won't be a t-test or Z-test. You can Google "hypothesis test non normal distribution" to find more details.
@ishpandey7886
@ishpandey7886 3 жыл бұрын
@@emma_ding Thanks... Would love to get one end-to- end hypothesis problem with code... That would be really helpful...
@bluestacheandego
@bluestacheandego 3 жыл бұрын
Hi! Thanks for the videos! I see you got Oreiley textbooks behind you. Do you recommend them? if so, how do you study from them? thanks
@emma_ding
@emma_ding 3 жыл бұрын
Haha, interesting question! Depends on what you are interested in, two books I highly recommend - Practical Statistics for Data Scientists (if you are interested in learning statistics in practice) and Designing Data-Intensive Applications (if you are interested in software engineering).
@thegreatlazydazz
@thegreatlazydazz 3 жыл бұрын
Can you give some material which discusses whty theoretically we cannot use t tests for binomial proportions.
@emma_ding
@emma_ding 3 жыл бұрын
Here you go stats.stackexchange.com/questions/90893/why-use-a-z-test-rather-than-a-t-test-with-proportional-data!
@shirleygui6533
@shirleygui6533 2 жыл бұрын
Great video! but there is a small point that I was confusing: if the sample size is large enough, according to the CLT theorem, it follows the normal distribution (variance can be calculated from the sample data), then we should use z-test instead of t-test because we "know" the variance? Is my logic correct? THank you
@irisyao8691
@irisyao8691 2 жыл бұрын
I have the same question, if the sample size >30, we can use z-test by using sample variance though we don't know population variance.
@Han-ve8uh
@Han-ve8uh 3 жыл бұрын
If a company has defined more than 2 stages in conversion, so not just no-click/click, but like 1. Open product page 2. Add to checkout 3. Open Payment confirmation Page ... It won't follow bernoulli anymore since there are more than 2 outcomes. Are there tests for this, or we have to still use bernoulli and treat outcomes as "reached stage x vs not reached stage x"? How does the latter case affect analysis?
@emma_ding
@emma_ding 3 жыл бұрын
In those cases, you can simplify the problem with "conditions": given users passed all previous stages, the behavior of entering or not entering to the next stage follows Bernoulli distribution. This will make testing a lot easier.
@diazjubairy1729
@diazjubairy1729 3 жыл бұрын
What's the difference between hypothesis test and a/b test ?
@jimbocho660
@jimbocho660 2 жыл бұрын
An A/B test is one type of hypothesis test.
@maryamomar4106
@maryamomar4106 2 жыл бұрын
I love you.
@emma_ding
@emma_ding 2 жыл бұрын
I'm glad you find the content so loveable! Thank you Maryam.
@sssam844
@sssam844 Жыл бұрын
could you please attach the subtitles as well? I find your videos fantastic and helpful but I have difficulty understanding the pronunciation of some words
@emma_ding
@emma_ding Жыл бұрын
Sure thing! Thanks for the suggestions. I've added subtitles to my most recent videos, and will add more!
@nagrajkaranth123
@nagrajkaranth123 2 жыл бұрын
Sis please cover all the interview questions of data science
@nagrajkaranth123
@nagrajkaranth123 2 жыл бұрын
Great sis I subscribed your channel help me to clear data science interview sis
@TheElementFive
@TheElementFive 2 жыл бұрын
Sis?
@vivekambastha2273
@vivekambastha2273 3 жыл бұрын
May be good topic, but the presentation on topics is not good, also have some pauses while switching the topics
@emma_ding
@emma_ding 3 жыл бұрын
Thanks a lot for the feedback! I'll pay more attention to pauses in the future!
I CAN’T BELIEVE I LOST 😱
00:46
Topper Guild
Рет қаралды 118 МЛН
Каха и суп
00:39
К-Media
Рет қаралды 5 МЛН
Statistics made easy ! ! !   Learn about the t-test, the chi square test, the p value and more
12:50
Bootstrapping Main Ideas!!!
9:27
StatQuest with Josh Starmer
Рет қаралды 439 М.
Hypothesis Testing - Z test & T test
14:14
Pax Academy
Рет қаралды 88 М.
How To Know Which Statistical Test To Use For Hypothesis Testing
19:54
Amour Learning
Рет қаралды 746 М.
The most important skill in statistics
13:35
Very Normal
Рет қаралды 310 М.