A/B Testing Analysis Made Easy: How to Use Hypothesis Testing for Data Science Interviews!

Рет қаралды 59,342

Күн бұрын

This video is part 2 of hypothesis testing problems in data science interviews.
✔️ Part 1 of hypothesis testing problems in data science interviews:
• Mastering Hypothesis T...
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Intro
00:39 Two-sample test of proportions
4:40 Statistical significance
5:11 Practical significance
07:11 Two-sample test of means
10:19 Welch's t-test

Пікірлер: 112

@emma_ding 3 жыл бұрын

Correction: Thanks Yidan Shang -- At 9:12, the Spool calculated should be 1.06 instead of 1.099. Thanks Ruby Jiang -- At 7:38, the mean of treatment is 1.7 instead of 2. The subsequent calculations should be changed accordingly.

@bettergreta464 10 ай бұрын

hi Emma, thanks for this. I have a question regarding 9.12, why don't we use SS divided by N? that's the formula for standard deviation i think ?

@Quasar_Energy 3 жыл бұрын

What I love about your channel is that you don't charge $300 to unemployed job seekers for this information.

@jiercc3138 3 жыл бұрын

我觉得这个系列真的非常不错。每一个视频虽然看起来短，但是把要点提炼的非常好，详略得当，非常goal oriented，就是为了interview来的。谢谢！

@zbear404 3 жыл бұрын

I recommended you to all my classmates. Excellent work and presentation of what is needed!

@CityInvisible 2 жыл бұрын

My hiring manager actually recommended this series of videos. Super helpful for someone who doesn't have much business experience. Thank you!

@sooryaprakash6390 2 жыл бұрын

quality of content is top notch! thanks for making these videos .looking forward to learn more from you.

@arojitdas8256 Жыл бұрын

All your videos are gold mine. Keep up the good work

@TiantianGao Жыл бұрын

Great content!!! The best explanation of z-test and T-test on KZfaq! Great examples!!! Feel very lucky to find you here🙏! Thank you!!!

@emma_ding Жыл бұрын

Thanks so much for your kind words! Happy to help. 😊

@hello-pd7tc 2 жыл бұрын

so so so helpful! Thank you Emma

@jithendrayenugula7137 3 жыл бұрын

Really great video thanks 😋 I appreciate the effort 🙏

@pro100olga 3 жыл бұрын

Thanks a lot for your channel! It helped me to prepare and get a job offer! :)

@yueleji8892 3 жыл бұрын

Thank you sooooo much Emma! I have trouble with combining the hypothesis and A/B testing knowledge together, your video saved me!!!

@yueleji8892 3 жыл бұрын

And also a quick question, would you explain the difference between the practical significant boundary and the minimum detect effect? Thank you!

@hermit597 3 жыл бұрын

@@yueleji8892 Wouldn't that be the same? Since practical significance measures the effect size, to me it makes sense for it to be the minimum detectable effect - established before the test is conducted.

@sandeepgupta2 3 жыл бұрын

Amazing tutorial !!!

@jonsings 3 жыл бұрын

Super helpful!

@viviangong6760 Жыл бұрын

Great explanation Emma! Nice work! For the second case, can you show in formula and calculation of how did you come up with the lower bound is more than Dmin 0.05?

@leizhang1699 3 жыл бұрын

Hi Emma, really appreciate that you made all the great videos, which is very helpful. I was wondering if you can make some videos about how to handle the take home challenge such as Lyft and Airbnb. Any information will be highly appreciated! Thanks

@emma_ding 3 жыл бұрын

Take home challenge would be an interesting topic. Stay tuned!

@star_7776 Жыл бұрын

Thank you Emma, I am learning a lot, God bless you! I finally feel I understand this topic.

@emma_ding Жыл бұрын

I'm so happy to help! 😊

@tonghooooo1383 3 жыл бұрын

Hi Emma, thanks for making these great videos. I have a quick question about the term you use at 9:00. Should it be pooled standard deviation instead of pooled standard error here?

@Crtg17 2 жыл бұрын

Hi Emma - Thank you so much for the sharing! I have learnt a lot. While I would like to clarify the formula you used to calculate the SE in the Confidence Interval of two sample proportions (4:12). The formula you used is the SE in the test statistic of two proportion Z-test, but the SE for the CI should be different (sqrt(p1*(1-p1)/n1 + p2*(1-p2)*n2) . Please correct me if I am wrong here. Thanks

@ys2660 2 жыл бұрын

you are right

@yidanshang1382 3 жыл бұрын

Hey Emma, great video and love the decision flow chart. One quick question - at 9:12, why the Spool =1.099? I calculated the pooled variance =1.13, and the square root of that would be 1.06 instead of 1.099.

@emma_ding 3 жыл бұрын

Thanks for pointing out the mistake.

@Sn-nw6zb 3 жыл бұрын

Pretty good explanations with example. Thank you. Do all companies use t-test and z-test or do they use boostrap test by running simulations?

@shreyaschaturvedi1933 3 жыл бұрын

excellent video! i had one question though: what do you do if you are dealing with unequal variances in your control and treatment group? I'm assuming you can't calculate SE using the pooled formula you have shown, right? would appreciate some advice on this!

@wclin3872 3 жыл бұрын

Thank you for sharing this! I have a question - in example 2, although we don't know the population variance, but the sample size is large. Can we use the sample standard deviation to estimate the population standard deviation (large sample size) so that we can use Z-test? Thanks!

@sijunjiang9744 8 ай бұрын

Hi Emma, thank you for your valuable video. In the video at 5:57, dmin = 0.05, but at dmin = 0.01. I am a little confused the value of dmin. why it was changed from 0.05 to 0.01 and the value is determined randomly or it could be calculated by some formula? Thanks

@harryfeng4199 2 жыл бұрын

This is a BLESSING. Thnk u so much.

@emma_ding 2 жыл бұрын

You are welcome 😊

@allenlu3021 3 жыл бұрын

Hey Emma great video, really helping me understand the process of AB testing! From watching all of your series on this topic, one thing I'm having trouble understanding is the relationship between MDE and practical significance. I understand MDE is used to calculate sample size such that the sample size calculated is the sample size needed to detect statistical significance at the magnitude of our MDE. In my mind I thought the point of the MDE is such that we can have our null hypothesis be "variant is not larger than control by given MDE (if MDE is positive)"; however, the case seems to be null hypothesis is just control metric != treatment metric. Is the MDE used later on then as the practical significance boundary you mentioned in this video then and it doesn't have anything to do with determining statistical significance beyond helping estimate our initial sample size?

@namandoshi4478 2 жыл бұрын

did u find an answer to this?

@flying3152 2 жыл бұрын

Super helpful!!!

@emma_ding 2 жыл бұрын

Glad you think so!

@jiercc3138 3 жыл бұрын

One question Emma. at 7:43, the mean of the treatment is 2. However, by calculating directly from the data array you gave, the mean in treatment I got is 1.7. This can be validated by the sum of squares since if you use 2, the SS of the treatment group at 9:10 will actually be 37 instead of 34.3. Could you double-check? And also could let us know that why the difference of mean at 9:22 is 0.633? Since it would be 0.6 if you use 2 - 1.4, and 0.3 if you use (1.7 - 1.4). Thanks!

@jamesy6213 2 жыл бұрын

非常感谢！讲的比学校老师好太多了！！

@pushkarajpalnitkar1695 3 жыл бұрын

Great video! Coding rounds are also big part of the interview process. Can you please make some videos on that too? That will be great. Thank You.

@emma_ding 3 жыл бұрын

Stay tuned!

@dadaunion Жыл бұрын

Hey Emma @Data Interview Pro, Thanks for the great video. But I am quite confuse regarding the CI for second case. As the practical interval is 5%, but the CI you calculated is the actual number (number of post interval). Just wondering how they can be compared.

@nelsonchou1023 Жыл бұрын

Hi Emma, I'm analysing a conversion A/B test result. I wonder how to account for the issue that a change of conversion is due to the different directions of numerator (checkout sessions) and denominator (homepage sessions) ? E.g. the homepage sessions reduced while the checkout only increased slightly or no change at all? Can we conclude that the treatment group actually performs better? Thanks.

@cococnk388 Жыл бұрын

Thanks 😊

@wuru6097 2 жыл бұрын

Hey Emma at 5:52, when calculating the margin of error, the Z you used is 1.96 which is from statistical significance level instead of the practical significance boundary. Could you please confirm if this what's supposed to be used? Thank you!

@rantao1593 2 жыл бұрын

One question - do we also look at p value to decide if there is a statistical significance, or we only need to compare test statistics and z-score?

@JoelPrinceVarghese 3 жыл бұрын

In your example, what is the timeframe for the average number of posts? If I just want my average posts per user to go up, how do I decide the timeframe to run the test? Also, say I wanted average posts per day to go up and I have data for the same users across multiple days, how do I check my hypothesis then? Hope this makes sense.

@SuperLOLABC 3 жыл бұрын

Great video as always Emma! I have a question, is it alright to schedule a Technical phone screen 3 weeks out and the on-site interview a whole month after the technical phone screen? Is it possible to get rejected by the recruiter if I ask for such far out interview dates?

@emma_ding 3 жыл бұрын

You can totally discuss it with the recruiter. No worries. Your recruiter wants to help you with to do your best. :)

@jasonsj 2 жыл бұрын

Thanks Emma for the video! Very helpful! is it a typo on 9:10, should it be “pooled” standard deviation? Cause it is formula to calculate sd instead of se, thanks！

@cccspwn 2 жыл бұрын

What is the difference between using the practical significance boundary and minimum detectable effect?

@akankshakumar731 2 жыл бұрын

Can you do a video on Chi - squared test, like here the click through probability would be characteristic type.

@travissun6753 3 жыл бұрын

Hi Ding, How can we find the most precise flow chart of statistical test, there are versions of them on google.

@shashikantprajapati7364 6 ай бұрын

Hi @emma_ding thanks for creating such great and informative videos on A/B testing. I did not understand how did you calculate SS(sum of squares for control and treatment groups)?

@anaspatankar6999 3 жыл бұрын

Why does the difference in sample proportions (d) follow a normal distribution? is it because the sample size is large enough?

@TiantianGao Жыл бұрын

Hi Emma, I have a question about the sample size. Do the control and treatment group have to have the exact same sample size? For example it’s 1000 users here for both. Can I have control group 1023 users, and my treatment group have 1048 users? Will this affect our result? Thank you! Thanks you for all the great contents!!! It significantly helped me understand better about hypothesis testing! ❤

@sinarashidian9888 3 жыл бұрын

Thanks for going through these problems step by step. In interviews, are we supposed to use built-in libraries (for instance scipy) for these questions or implementing everything from scratch? First one shows we are familiar with libraries, latter one shows we know the math :) I am not sure which way is the best one to go.

@emma_ding 3 жыл бұрын

Good question! In most cases using build-in libraries would be good enough, but I'd suggest always check with the interviewer on the requirement before you diving into coding anything.

@chayanontpotawananont9317 3 жыл бұрын

I can't thank you enough

@hieification 3 жыл бұрын

CI in the second case is coming inside the practical boundary for me. Am is missing something? CI for d = 0.633(+ or -) 2.2018 (Multiply 2.002*1.0998). so the range is -1.56 and 2.83. Really helpful video. Thanks Emma!

@emma_ding 3 жыл бұрын

Yep, I believe you are missing a multiplier 0.258 for the margin of error (time 9:18), then you'll get the CI 0.0648 to 1.201. Hope it helps! Let me know if you have other questions :)

@kuipan5968 3 жыл бұрын

@@emma_ding Hey, I have the same question. Where is 0.258 coming from? CI = d +/- Z*SE, right? Here d = 0.633, Z = 2.2018, and SE = 1.0998.

@lakshmank 3 жыл бұрын

@@emma_ding Hi Emma, Thanks for the video. I have the same question as well. Isn't the width of CI = Z*SE? Why are you multiplying Z*SE again with 0.258? Seems to be mistake in the calculation in your video to calculate CI boundaries?

@hezhaojiang3525 2 жыл бұрын

@@emma_ding Isn't the width of CI = Z*SE? Why are you multiplying Z*SE again with 0.258?

@pushinhuang2872 2 жыл бұрын

Same questions here

@korkutkaynardag9147 10 ай бұрын

why do we use the same z value used for p value to calculate the confidence interval. Should we not choose the z value for calculating confidence interval based in 0.01 practical significance boundary (6:50)?

@cococnk388 Жыл бұрын

Hello miss, In some books, they make use of permutation test to carry on hypothesis testing to analyse A/B experiment’s results ? Can you tell us when to use Permutation test or the test you have presented in your video (binomial… t and z test) ? Thanks

@dantongzhu1310 3 жыл бұрын

Hi Emma, how did you calculate the CI in the second example exactly when assuming similar variance for both control and treatment? I got the margin of error = t-score*SEpool = 2.002*1.0999 = 2.202. Then with \hat{d} = 0.6, which would then give a very big CI that includes the entire [-dmin, dmin] = [-0.05, 0.05]. But it seems like in your slides the CI is strictly on the right side of [-dmin, dmin]. I'm very confused and would appreciate some help! Otherwise, your videos have been super super helpful!! Thanks.

@paramawasthi24 2 жыл бұрын

Margin of error would be t-score*Spool (1/(1/nc +1/nt)^1/2), which would come to be ~0.51 which +/- from d-hat (0.6) would be above the significance level of 0.05

@chloehuajingjiang9128 Жыл бұрын

@@paramawasthi24 can I ask why we can’t use Spool we calculated using (SS/df) ^1/2 = 1.0999?

@user-nk4fx1tb4w 3 жыл бұрын

Hello Emma, Thanks for all your helpful content :) I attempted the steps you gave in the video but was unable to solve this question. In Loan application analysis task. What is the best way to solve this Hypothesis testing? What's the effect of owning a car on the likelihood of a loan application being accepted? Own_car attribute is Yes(1) or No(0) loan_application_accepted attribute is True(1) or False(0) The dataset has 1000records own_car value count: No 598 yes 405 loan_application_accepted count: False 703 True 300 own_car value count where loan_application_accepted is True: No 187 yes 113 Null Hypothesis: Owning a car doesn’t affect loan application acceptance Alternate Hypothesis: Owning a car does affect loan application acceptance

@user-nk4fx1tb4w 3 жыл бұрын

How should this Hypothesis test be computed: What's the effect of owning a car on the likelihood of a loan application being accepted?

@rash_mi_be 2 жыл бұрын

Hi How did you find critical Z score in the first part as 1.96?

@maheshchandra5717 3 жыл бұрын

Hey Emma, just a quick question, have you heard of Strata Scratch? Is it a good platform to practice interview-style SQL and Python coding questions? Are the questions actually asked in those companies which are tagged? Would love to hear your thoughts.

@emma_ding 3 жыл бұрын

I'm not familiar with the platform. I have only used LeetCode and hackerRank.

@kylehuang7926 3 жыл бұрын

I use both - Nate's very good at explaining SQL and Emma is good at statistical and product sense questions

@jonglee8162 3 жыл бұрын

Hi Emma, thanks for the video! How do we know that we have a large sample size?

@emma_ding 3 жыл бұрын

For sample size, you can refer to the diagram in the part 1 video. kzfaq.info/get/bejne/f79nrJClmJa5epc.html

@johnstephen399 3 жыл бұрын

At 5:05, should you not be using the Z-score for alpha = 0.025 instead of 0.05 since you're using a two-tailed test?

@emma_ding 3 жыл бұрын

For two-sample tests, usually we test if one is greater than the other. One-tailed tests would make the most sense.

@Han-ve8uh 3 жыл бұрын

Thanks for showing all pooling formulas and concepts in one place. At 1:16 and 7:50 you mentioned practical significant boundary of 0.01 and 0.05, how does the experimenter come up with these values? Is it from calculating business costs and revenues? You talked about checking CI, which reminds me of something confusing i read from point 4 on hookedondata.org/guidelines-for-ab-testing/. Could you comment on why the cases she cites are possible? (CI that is wide and close to 0 vs CI that is tiny and far away.) What i don't understand is assuming the same p-value (not sure if this assumption is required for this discussion), how can a CI be tiny (think she means narrow )AND far away simultaneously? CI width depends on standard error, so a narrow CI means a narrower sampling distribution of whatever statistic, and the centre of CI should be closer to the null hypothesis sampling distribution centre compared to the CI with a larger width. Is my reasoning correct that a narrower CI would have a centre that is closer to the null centre? How to understand what Emily is saying there then? She seems to say the narrower CI can have a centre of CI that is further.

@emma_ding 3 жыл бұрын

Typically, those values are given during interviews. If not, you can discuss with the interviewer.

@ankityadav-eq7fe 2 жыл бұрын

How did we get practical significance boundry?

@xinyuechang6062 2 жыл бұрын

I am very confused by the practical significant boundary, why in example 1, dmin =0.01, and in example 2, its 0.05?

@elderpinzon7686 3 жыл бұрын

Did anyone else try to reproduce the results of the two-sample test of means? I get that the mean of the treatment is 1.7 (not 2.0 like in the video). This changes the conclusion, the result is not statistically significant. I think it's a mistake since my calculation for the pooled standard error (which uses the standard deviation of the treatment) matches perfectly

@elderpinzon7686 3 жыл бұрын

Using ttest_ind from scipy.stats I confirm my result. I think there is a typo somewhere in the video

@ey2392 3 жыл бұрын

agree

@firesongs Жыл бұрын

5:22 How do we know that the center of the CI is 0.012?

@karundeep07 3 жыл бұрын

Hey Emma, Thanks for this amazing video.. Just wanted to know ... why did we picked 0.01 as practical level of significance... we could have picked any other value as well like 0.02 or 0.03 or 0.04 (any thing < 0.05 ( α - statistical level of significance)). Does practical level of significance less need to be = 0.01 ?

@emma_ding 3 жыл бұрын

In practical, a company would pick a practical significance level makes the most sense. 0.01 is just an example to show you the difference between statistical and practical significance. Hope it helps!

@adooby001 3 жыл бұрын

@@emma_ding Is practical significance the similar to MDE? In my past roles, I've seen the MDE that was agreed on during experimental design to serve as the practical significance.

@tangled55 3 жыл бұрын

Hi Emma, around 2:28, where you say "Bernoulli" population, I think, to be clear, you want to make a designation. Bernoulli deals with the data which only has ONE trial and two possible outcomes, but the Binomial is the collection of Bernoulli trials for the same event (multiple trials of the same event). So since we're doing a hypothesis test on the number of successes in multiple trials, the assumption is that successes follow a binomial and not a Bernoulli, right?

@ristyping 2 жыл бұрын

Yes I was thinking the same thing. The population cannot be a Bernouli because it has n amount of trial with two possible outcomes. Even in this case, just simple have n>1 for both samples indicates that it is a Binomial.

@cococnk388 Жыл бұрын

The concepts does not change …

@dwardster 2 жыл бұрын

Do interviewers ever ask to calculate test stats or confidence intervals? In that case can we look up or ask for the formulas?

@emma_ding 2 жыл бұрын

Hi there, good question! It would really depend on the interviewer, but I'd suggest to remember the equation for commonly used hypothesis testing, eg. 1 sample and 2 sample z-tests and t-tests.

@dwardster 2 жыл бұрын

@@emma_ding thank god for remote interviews 😉

@myworldAI 3 жыл бұрын

Hi , l have 2 sets of the supermarket customer bring their own plastic bag sample. One sample is the supermarket provide free plastic bag, the other sample is the supermarket charging a fee for plastic bag. What kind of statistics test should I use ? Can I use Two -sample test of proportions ? Thanks

@emma_ding 3 жыл бұрын

Yes you can.

@myworldAI 3 жыл бұрын

@@emma_ding Thank you very much👍👍👍👍👍

@ashwinmanickam 2 жыл бұрын

8:00 Case 2 T - test

@jasonwong8315 3 жыл бұрын

Where did the practical sig boundary come from？ Rule of thumb？

@emma_ding 3 жыл бұрын

Typically, those values are given. If not, you can discuss them with the interviewer during the interview.

@jonsings 3 жыл бұрын

@@emma_ding Thanks for clarifying i also had this question.

@WashingMykale 3 жыл бұрын

Just to confirm, in the 2-sample test of means, were you doing a 2-sided test? because I saw that you looked up the t-score under the 0.975 column for alpha = 0.05.

@mrakashgupta 3 жыл бұрын

+1 to that - though, I saw in one of Khan Academy video, for one tail test with alpha = 0.05, we need to refer 97.5% against required df to get critical T value for comparison. @Emma, please share your thoughts.

@rubyjiang8836 3 жыл бұрын

Mean of treatment is 1.7 not 2. Then I think the second example is not significant...

@ey2392 3 жыл бұрын

Agree

@yij9010 Жыл бұрын

did you put on a programmer plaid shirt😂

@alexisdamnit9012 11 ай бұрын

Coming from a statistics background, this is some weird notation. That aside, I think she's doing well, but the notation just throws me off. Data science is a strange mess