Logistic Regression in R, Clearly Explained!!!!

  Рет қаралды 509,326

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

This video describes how to do Logistic Regression in R, step-by-step. We start by importing a dataset and cleaning it up, then we perform logistic regression on a very simple model, followed by a fancy model. Lastly we draw a graph of the predicted probabilities that came from the Logistic Regression.
The code that I use in this video can be found on the StatQuest GitHub:
github.com/StatQuest/logistic...
For more details on what's going on, check out the following StatQuests:
For a general overview of Logistic Regression:
• StatQuest: Logistic Re...
The odds and log(odds), clearly explained:
• Odds and Log(Odds), Cl...
The odds ratio and log(odds ratio), clearly explained:
• Odds Ratios and Log(Od...
Logistic Regression, Details Part 1, Coefficients:
• Logistic Regression De...
Logistic Regression, Details Part 2, Fitting a line with Maximum Likelihood:
• Logistic Regression De...
Logistic Regression Details Part 3, R-squared and its p-value:
• Logistic Regression De...
Saturated Models and Deviance Statistics, Clearly Explained:
• Saturated Models and D...
Deviance Residuals, Clearly Explained:
• Deviance Residuals
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZfaq Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
0:29 Load and format data
3:54 Dealing with missing data
5:03 Verifying that the data is not imbalanced
6:44 Logistic regression with one independent variable
12:48 Logistic regression with many independent variables
15:13 Graphing the predicted probabilities
#statquest #logistic

Пікірлер: 640
@statquest
@statquest 3 жыл бұрын
Here's the link to the code: github.com/StatQuest/logistic_regression_demo/blob/master/logistic_regression_demo.R Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla6242
@falaksingla6242 2 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@holeman1
@holeman1 3 жыл бұрын
This 89-year-old guy says BAM!! So clearly explained, indeed. DOUBLE-BAM!!!!
@statquest
@statquest 3 жыл бұрын
BAM!!! And thank you for your support!!!!
@MuctaruKabba
@MuctaruKabba 4 жыл бұрын
Your videos never disappoint, Sir. I have gone through many of them and think you've earned the right to brand the phrase: "clearly explained" because your explanations are indeed very clear. I am building a better explanation of statistics thanks to you. I appreciate you and hope you continue to pass on the knowledge.
@statquest
@statquest 4 жыл бұрын
Wow, thanks!
@wei2674
@wei2674 4 жыл бұрын
Thank you so much Josh for all these videos! I got Aplus for most of my stat courses quite a few years ago when I was doing my MSc of BIostat, but it took me quite some time to come up with a better understanding of a few concepts. You just summarized and presented these ideas and more in a few minutes! You are a genius and on top of that, you are so Kind to share all these work to everyone for free! With my limited vocabulary, all I can say is THANK YOU! It makes me feel the world is a beautiful place with beautiful mind and soul. I love your song “hello”, it reminds me of the day I met my daughter and brought happy tears to my eyes :)
@statquest
@statquest 4 жыл бұрын
Thank you so much!!! I'm really glad you like my videos and my music. :)
@i8thelastmoa360
@i8thelastmoa360 4 жыл бұрын
Your videos cover everything in my course and I wish I found you sooner! So much detail and clear explaining in such little time
@SurrenderPink
@SurrenderPink 4 жыл бұрын
Josh, it’s Saturday morning here and I’m enjoying a cup of Bam! learning R from the best teacher on the planet. I’m so grateful and appreciative of your efforts to share your considerable talents with us!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@chasti5754
@chasti5754 3 жыл бұрын
I just wish one day all this information actually stays and sticks to my mind... thank you thought! Your videos are amazing!
@statquest
@statquest 3 жыл бұрын
Thanks for watching!
@emilyblythe7708
@emilyblythe7708 5 жыл бұрын
where have you been my whole thesis! thank you!!
@statquest
@statquest 5 жыл бұрын
Hooray! I'm glad to help! :)
@amandacampos3037
@amandacampos3037 3 жыл бұрын
I feel the same!! hah
@solalstenou6474
@solalstenou6474 5 жыл бұрын
What is great with your video is that even if I forgot my headphone I am able to follow the video in the computer room full of other students! Thank you so so so much !!!! From University of Bordeaux
@statquest
@statquest 5 жыл бұрын
Solal Sténou Merci!! :)
@meniz4659
@meniz4659 4 жыл бұрын
You will surely be in my Thesis acknowledgments. Thank you for making our lives relatively easier but truly more ineligible. BAAAAAM!!
@statquest
@statquest 4 жыл бұрын
Thanks so much! :)
@japhethernandezvaquero204
@japhethernandezvaquero204 4 жыл бұрын
Nice channel to land on! Happiest discovery of my 2020! Great job!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@zahraab1027
@zahraab1027 4 жыл бұрын
"one last shameless self promotion" got me 😂😂😂.....that's why I love your videos, u make learning stats fun
@statquest
@statquest 4 жыл бұрын
Hooray! Thank you! :)
@marielledelcarmencaballero5017
@marielledelcarmencaballero5017 2 жыл бұрын
Your videos are great! It's also so nice of you that you take the time reply to so many of the comments here !
@statquest
@statquest 2 жыл бұрын
Thank you!
@alexandergeorgiev2631
@alexandergeorgiev2631 3 жыл бұрын
You are an absolute life saver. My data science paper is due in two days and now I have my pretty log graph and I understand this better. DOUBLE BAM!!!!!
@statquest
@statquest 3 жыл бұрын
Hooray!
@nkristianschmidt
@nkristianschmidt 3 жыл бұрын
so, how did it go today?
@yashilagovender5134
@yashilagovender5134 2 жыл бұрын
Thank you so much for this video! I've been suffering with the coding for my project but this really helped. You're a star!
@statquest
@statquest 2 жыл бұрын
Thanks!
@565-FENRIR
@565-FENRIR 2 жыл бұрын
I really enjoyed the clearly way to explain us this topic. So many thanks for the teaching!!!
@statquest
@statquest 2 жыл бұрын
Thank you very much!!!
@dodgecarlincila879
@dodgecarlincila879 3 жыл бұрын
I was just here for the logistic regression but bam!! I would be watching all of your videos. As a ds learner using r, double bam!!!, your videos will surely help big time! Bambambam! 👌😅 Thank you. 🙂
@statquest
@statquest 3 жыл бұрын
Awesome! Thank you!
@alhaque7556
@alhaque7556 2 жыл бұрын
Thank you so much! I've a stat project to do in R with logistic Regression and this simplified the coding portion so much!
@statquest
@statquest 2 жыл бұрын
Hooray!
@farhadwaseel9981
@farhadwaseel9981 4 жыл бұрын
I recommend all the videos by stat quest with Josh Starmer. Thank you for your good explanations.
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@burrohq
@burrohq 3 жыл бұрын
You sir deserve a promotion 👏 thanks for this incredibly helpful video
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@LoizidesGeorge
@LoizidesGeorge 4 жыл бұрын
So helpful, thanks! Whenever you come to Cyprus let me know for few free accomodations in our mountainous region, Marathasa! Thx again! Γ
@statquest
@statquest 4 жыл бұрын
Wow! That sounds awesome!!!
@LoizidesGeorge
@LoizidesGeorge 4 жыл бұрын
@@statquest oh yes! I owe you a lot - you saved me so many hours! Γ
@chrischukwu2956
@chrischukwu2956 3 жыл бұрын
You are an amazing teacher. God bless you!
@statquest
@statquest 3 жыл бұрын
Thank you! 😃
@nathanielchristian7027
@nathanielchristian7027 4 жыл бұрын
Your simple English explanation of the meaning of "Intercept" in the output from 8:30 to 8:38 of this video was something I could not find after searching for 2 hours. Thank you!
@statquest
@statquest 4 жыл бұрын
Awesome!!! Now that you have that concept down, a lot of other stuff in statistics should make more sense. (At least I hope!) :)
@wei2674
@wei2674 4 жыл бұрын
Both my husband and I learned so much from ur video. ( inspired by the top comment), whenever you come to Toronto let us know for a few free accommodation in our Asian restaurant/bubble tea surrounded neighborhoods (north York center)! Thx again! Xin
@statquest
@statquest 4 жыл бұрын
Hooray!!! That would be awesome. I will dream of the day I can visit you in Toronto. :)
@mutuamutunga
@mutuamutunga 4 жыл бұрын
This has been extremely helpful. Thank you!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@Mel22Brasil
@Mel22Brasil 3 жыл бұрын
It must be so much fun working with you! Thank you for this tutorial. =)
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@danee593
@danee593 5 жыл бұрын
Josh you are amazing, thank you!
@joseluismanzanares3662
@joseluismanzanares3662 5 жыл бұрын
Clear as water. Super BAM!!! Gracias por compartir
@riteshpatel1984
@riteshpatel1984 5 жыл бұрын
Hi Josh, thanks for your videos they are very easy to understand. Really appreciate your efforts. I believe I speak for many, Because of you many people are able to understand with utmost clearity and you cover all the small details with super ease. Keep up the Nobel work. Cheers 👍 Would it be possible for you to put up a video on model evaluation i.e. determining cutoff and model performance. Thanks
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@daviddevega4433
@daviddevega4433 3 жыл бұрын
Thanks you very much for all stuff. You have saved me to fail my exams. Amazing quality channel Unbelievable the low number of likes. Very appreciated channel, at least for me. Thanks again.
@statquest
@statquest 3 жыл бұрын
Wow, thanks!
@critiquessanscomplaisance8353
@critiquessanscomplaisance8353 4 жыл бұрын
I won't forget you in the acknowledgments sir haha!!! Great job!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@nl7247
@nl7247 Жыл бұрын
Thanks for also showing how to wrangle data and explore missing data in a simple helpful way ❤
@statquest
@statquest Жыл бұрын
My pleasure 😊
@ricardot4722
@ricardot4722 4 жыл бұрын
I am impressed, you are talented, thanks for your sharing your knowledge.
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@danieltrodler4340
@danieltrodler4340 4 жыл бұрын
Great content and incredible value. Thank you so much
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@paulshannon9708
@paulshannon9708 5 жыл бұрын
You really are wonderful for explaining this in a way morons like me can understand, this is so incredibly helpful. Thank you so much!
@BruceWayne-oc7dn
@BruceWayne-oc7dn 2 жыл бұрын
Its's 1:11 AM and what I am doing is DOUBLE BAM. Thank you for this awesome video. U are hero.
@statquest
@statquest 2 жыл бұрын
Thanks! :)
@maheshkumar-vv5fp
@maheshkumar-vv5fp 4 жыл бұрын
good looking white background... graphs are beautiful... whatever you say, you write it on screen.... your sound and sound system, very good.. the way you explain things, CLEARLY EXPLAINS everything.. and loved that music part and BAM!!! and here, i have something to say about your work.. and that is VERY BIG BAM !!!... good luck.. keep growing..
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@goodsuggestionbutno6783
@goodsuggestionbutno6783 2 жыл бұрын
Hoooray! We made it to the end of an exciting journey through logistic regression! Hope you have a nice day, and thank you for understanding the output for logistic regression in R, which really cant be understood thoroughly without watching all the logistic + odds videos!
@statquest
@statquest 2 жыл бұрын
Yep, that is correct. That's why I made all those other videos first - the output is jam packed with stuff.
@mohamedhijazi8460
@mohamedhijazi8460 4 жыл бұрын
You're the man! thanks for everything!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@danielromero-alvarez5392
@danielromero-alvarez5392 4 жыл бұрын
you are just the best! Thanks for doing this!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@tansutazegul8297
@tansutazegul8297 Жыл бұрын
incredibly brilliant tutorial!
@statquest
@statquest Жыл бұрын
Thanks! :)
@kedwards127
@kedwards127 4 жыл бұрын
This is so helpful thank you!!
@statquest
@statquest 4 жыл бұрын
Hooray! :)
@sheilaserrano1039
@sheilaserrano1039 5 жыл бұрын
Thaaaanks! very useful and clear!
@statquest
@statquest 5 жыл бұрын
Hooray! I'm glad you like it! :)
@saulesparza7911
@saulesparza7911 5 жыл бұрын
This video is amazing! Thanks!!!
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@N0o0x0e0r
@N0o0x0e0r 5 жыл бұрын
This channel has helped me a lot understanding statistics! Could you please make a video explaining the linear mixed model too?
@statquest
@statquest 5 жыл бұрын
Yes! However, it might be a while before I get to it.
@wa5561
@wa5561 2 жыл бұрын
Thank you for saving my study. Not gonna lie, this video made me cry. I was about to drop out because of statistics, but this saved my project.
@statquest
@statquest 2 жыл бұрын
Hooray!
@yutassmilehealsme6572
@yutassmilehealsme6572 3 жыл бұрын
THANK YOU! somehow I couldn't find any websites explaining this
@statquest
@statquest 3 жыл бұрын
Glad you found it.
@fahmiidris4499
@fahmiidris4499 3 жыл бұрын
super dangg! Good explanation, bro!
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@at4652
@at4652 6 жыл бұрын
Great tutorials, I started with your PCA video and since then hooked onto other videos . Could I request you to do a video on various types of probability distributions when to use them.
@statquest
@statquest 6 жыл бұрын
Those are all in the works. I wish I could work 2 or 4 times faster than I can. I've wanted to cover the major probability distributions for over a year, but got sucked down a machine learning path and now feel spread pretty thin. However, these will happen eventually! :)
@TimothyChenAllen
@TimothyChenAllen 5 жыл бұрын
StatQuest with Josh Starmer could you make a video on how to work 2 to 4 times faster? :-)
@statquest
@statquest 5 жыл бұрын
As soon as I figure that out, I'll make a video on it! ;)
@weilianglim1764
@weilianglim1764 5 жыл бұрын
BAM!!!
@katere89
@katere89 5 жыл бұрын
Hi Josh, thanks for this amazing tutorial. Would you be able to add something interactions between predictors and random effects? I am trying to run a mixed-model logistic regression and have three-way interactions but not entirely sure on how to deal with them. Thanks so much :)
@amandacampos3037
@amandacampos3037 3 жыл бұрын
same!
@andreatulli356
@andreatulli356 3 жыл бұрын
Great video!!! Thank you so much!
@statquest
@statquest 3 жыл бұрын
Thanks!
@KayYesYouTuber
@KayYesYouTuber 4 жыл бұрын
Your videos are awesome. Thank you very much.
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@mihaelawassilko7414
@mihaelawassilko7414 5 жыл бұрын
Hi Josh, Thank you for the very informative tuturial. Do you have any videos for the multilevel modelling?
@statquest
@statquest 5 жыл бұрын
No yet.
@tuanlong9238
@tuanlong9238 5 жыл бұрын
And...BAM, thanks for sharing, your video is really useful :D
@statquest
@statquest 5 жыл бұрын
Thanks! :)
@Fsp01
@Fsp01 2 жыл бұрын
Doing a masters program on analytics and this video made more sense than all the lectures combined on logistic regression. thank you
@statquest
@statquest 2 жыл бұрын
Thanks!
@skandagurunathanr4795
@skandagurunathanr4795 4 жыл бұрын
Great salute! If you can, please post a video on all machine learning models with a large dataset example implementation in r with clear intuition and mathematics statistics behind it. Thanks.
@raghavendral882
@raghavendral882 5 жыл бұрын
BAM_ spot on thanks for such video.. my journey with logis tic regression and r has started.
@statquest
@statquest 5 жыл бұрын
Awesome!!! :)
@SritamaDutta_Asansol
@SritamaDutta_Asansol 4 жыл бұрын
Mine too
@mathieufen2239
@mathieufen2239 4 жыл бұрын
SO clear!! Thanks!!
@statquest
@statquest 4 жыл бұрын
Awesome!
@temjim
@temjim 4 жыл бұрын
Hi, Josh. I cannot thank you enough for these videos... Would also be good to have a similar video in Python..
@statquest
@statquest 4 жыл бұрын
Great suggestion!
@aishwaryadas3681
@aishwaryadas3681 2 жыл бұрын
@@statquest where's the video sir in python sir?
@ss11996
@ss11996 5 жыл бұрын
HI, I am having little trouble understanding how does a factor variable(string) can be inputed in a logistic model model which is mathematical ?
@marcelomurilloquesada8400
@marcelomurilloquesada8400 4 жыл бұрын
Hi, I really like your videos, every topic is as clear as water after watching it. I've watched this one and also the three videos about logistic regression's details. If you want to go further in this topic, you could do a video explaining emmeans package for R. Many people, including me, would understand post hoc tests for glm using emmeans, if someone like you explained it. Thank you!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@ca177
@ca177 3 жыл бұрын
YOU RAWK !! Awesome explains on ML concepts..
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@AOLFlyersNewsletters
@AOLFlyersNewsletters 4 жыл бұрын
Thanks Josh - you are our saviour!
@statquest
@statquest 4 жыл бұрын
BAM! :)
@AOLFlyersNewsletters
@AOLFlyersNewsletters 4 жыл бұрын
@@statquest Triple Booyah BAM from my side!
@wilfredoa.tovarhidalgo9385
@wilfredoa.tovarhidalgo9385 2 жыл бұрын
Excelent!!!! Thank you very much.
@statquest
@statquest 2 жыл бұрын
bam!
@kayizaisma6288
@kayizaisma6288 4 жыл бұрын
Great job bro. Gratitude for your help. You also have where to stay if you come to Uganda (Africa).
@statquest
@statquest 4 жыл бұрын
Thank you very much!!! :)
@ericaleverson9430
@ericaleverson9430 3 жыл бұрын
You are so good!! Thank you!
@statquest
@statquest 3 жыл бұрын
Thanks! :)
@mdhasibreza5161
@mdhasibreza5161 2 жыл бұрын
All of your videos are great and fun to learn from! Could you please upload a tutorial on mediation analysis using STATA and R (using the mediation package)?
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind.
@JRO_Lyrics
@JRO_Lyrics 2 жыл бұрын
great work done here
@statquest
@statquest 2 жыл бұрын
Thank you!
@RajeshSahu-ey8kw
@RajeshSahu-ey8kw 4 жыл бұрын
U are geneus...and ur teaching style too...hurray!!!! and Bamm!!!!
@statquest
@statquest 4 жыл бұрын
Wow, thank you!
@christelleleitzingerphd7491
@christelleleitzingerphd7491 3 жыл бұрын
Awesome! Thank you so much! Please could you do a video about conditional logistic regression like clogit in R with result interpretation and how it works when using adjusted parameters.
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@da2015
@da2015 4 жыл бұрын
These videos are so amazing! Do you have a suggestion for a book that explains Logistic Regression to newbies? The videos are super awesome, but extra references may help too. Hopefully you will write your own book soon! Thanks!
@shnibbydwhale
@shnibbydwhale 4 жыл бұрын
I know this is probably 10 months too late, but the book “Introduction to Categorical Data Analysis” by Alan Agresti is a great book. Does a really good job explaining logistic regression and is pretty light on the math.
@bellahuang8522
@bellahuang8522 2 жыл бұрын
me binge watching Josh's videos before midterm... anyone else? lmao
@statquest
@statquest 2 жыл бұрын
Good luck! :)
@andreluisal
@andreluisal 2 жыл бұрын
Excellent!!!
@statquest
@statquest 2 жыл бұрын
Thanks!
@TiNa-uo3ks
@TiNa-uo3ks 2 жыл бұрын
Thank You. SOOOOOOOOOooOOOoo Helpful
@statquest
@statquest 2 жыл бұрын
bam!
@thomasdrissi
@thomasdrissi Жыл бұрын
Hi Josh, Thanks for the really helpful video! Referring to the clip at 15:20. Whilst I know plotting a Predicted Y for a range of X values (say in a simple univariate logistic regression) we would expect to see that S shape. But for a multiple variable regression (as in yours) should the index of probabilities when ranked and plotted as you've done here always have to be in that S Logistic Shape? I am getting more of an exponential curve between 0 and 1, and can't tell if this means I have done something wrong/have something wrong with my model?
@statquest
@statquest Жыл бұрын
Hmm...I'm not sure. Your graph should definitely taper off as the predicted probabilities get closer to 1, but how visible this tapering is might depend on how many data points you plot.
@geetikapanda7152
@geetikapanda7152 3 жыл бұрын
The more I watch your videos the more the wish I had a teacher like you in my school days.. Do we have a video on chi square test?
@statquest
@statquest 3 жыл бұрын
Not yet. :( But one day we will.
@kingfisher65
@kingfisher65 9 ай бұрын
amazing. thank you man!
@statquest
@statquest 9 ай бұрын
Thanks!
@familians
@familians 9 ай бұрын
You may like this video too: Another great video about logistic regression in JMP kzfaq.info/get/bejne/b99-ktybrKeuink.htmlsi=jUwEZUDobBudE8AE
@vidyaammu1687
@vidyaammu1687 3 жыл бұрын
Thanks for the video. Your video made it look like so simple. I request you to upload a video of how to get risk ratios in multiple logistic regression model.
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@hajer3335
@hajer3335 6 жыл бұрын
Thank you so much for this effort really appreciate We need a stat quest on three topics: 1-Chi-square test, 2- The Hosmer-Lemeshow goodness of fit test for logistic regression. And 3- Iteratively reweighted least squares (IRLS) by using Newton's method. If you don't mind :) of course. Can you tell us about the title of next video?!
@statquest
@statquest 6 жыл бұрын
The Chi-Square test is on the list. I've looked into the Hosmer-Lemeshow fit... Can you tell me what you think about the limitations? Specifically those mentioned in the wikipiedia article about it? en.wikipedia.org/wiki/Hosmer%E2%80%93Lemeshow_test#Limitations_and_alternatives And iteratively reweighted least squares is also on the list. However, up next are some basic statistics videos and then videos on lasso, ridge, and elastic-net regression.
@hajer3335
@hajer3335 6 жыл бұрын
the Hosmer-Lemeshow statistic was used to avoid problem in Pearson chi-squared statistic which was when observations being grouped by the values of the x variables, the Pearson chi-squared goodness of fit test cannot be readily applied if there are only one or a few observations for each possible value of an x variable, or for each possible combination of values of x variables. (A sample with a sufficiently large size is assumed. If a chi-squared test is conducted on a sample with a smaller size, then the chi-squared test will yield an inaccurate inference). So in the Hosmer-Lemeshow statistic, the observations are grouped by expected probability. But there is very little guidance on selecting the number of subgroups. The number of subgroups,g, is usually calculated using the formula g> P + 1. For example, if you had 12 covariates in your model, then g > 12. How much bigger than 12 g should be is essentially left up to you. Small values for g give the test less opportunity to find mis-specifications. Larger values mean that the number of items in each subgroup may be too small to find differences between observed and expected values. Sometimes changing g by very small amounts (e.g. by 1 or 2) can result in wild changes in p-values. As such, the selection for g is often confusing and arbitrary. Also, it doesn’t take overfitting into account and tends to have low power. For these reasons, the Hosmer-Lemeshow test is no longer recommended. Am I on right? Is it enough cues to no longer used of HL test? I have another question, ( Overfitting is happening when your sample size is too small. If you put enough predictor variables in your regression model, you will nearly always get a model that looks significant. While an overfitted model may fit the idiosyncrasies of your data extremely well, it won’t fit additional test samples or the overall population. The model’s p-values, R-Squared and regression coefficients can all be misleading. Basically, you’re asking too much from a small set of data.) If I have a small sample, is there any problem to use Maximum likelihood to fit model and McFadden's pseudo-R squared? Is there any rule to chose the number of sample for any regression? Sorry for the many of questions, it is my first year in biostatistics. :)
@statquest
@statquest 6 жыл бұрын
These are all great questions. You are correct about the HL test and you are correct about overfitting. There are, however, lots of tricks you can use to compensate for overfitting (lasso regression, ridge regression, elastic net regression etc.) One way to test to see if you have a model that is "overfit" is to use cross validation. As for a minimum number of samples for logistic regression - people often say "10 samples per level of each discrete variable". It's a general rule of thumb and it doesn't always apply. However, again you can use cross validation to verify if you have enough samples or not. Cross validation is a very practical tool!
@hajer3335
@hajer3335 6 жыл бұрын
Thank you, Mr Josh, for answering me, I need to study more about Cross-validation.
@hajer3335
@hajer3335 6 жыл бұрын
Sorry l have more than one account 🙈🙊
@albertoconde6912
@albertoconde6912 4 жыл бұрын
Thank you so much for the video Josh! I have a question regarding ggplot. Why did you put alpha=1, shape=4, stroke=2?
@statquest
@statquest 4 жыл бұрын
Alpha =1 because I experimented with different values for alpha before realizing that having things 100% opaque (ie the default value) was best. Shape draws the Xs and stroke makes the Xs easier to see.
@farhatyasmin6543
@farhatyasmin6543 5 жыл бұрын
sir! have you made up any video about probit regression? There is no numerical example of probit regression. Will you help me about it? how to apply probit regression?
@BulLiT2401
@BulLiT2401 3 жыл бұрын
Love your videos. Could you do one on mixed logistic regression?
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@woopwoopsoupsoup678
@woopwoopsoupsoup678 Жыл бұрын
This man is a legend
@statquest
@statquest Жыл бұрын
:)
@arurirorivs
@arurirorivs 3 жыл бұрын
Great video. I have a question, if some of my categorical variables have that problem of having little amount of counts, how do you suggest to proceed?
@statquest
@statquest 3 жыл бұрын
Consider omitting them. However, it really depends. Maybe use cross validation to see if predictions are better with or without them. For details on Cross Validation, see: kzfaq.info/get/bejne/nLmpp9143N2mhqs.html
@sofiaalfonso9883
@sofiaalfonso9883 3 жыл бұрын
Sir, you are a savior
@statquest
@statquest 3 жыл бұрын
Thanks! :)
@chrishanni2779
@chrishanni2779 5 жыл бұрын
AWESOME!!!!! thank you. Questions, did you try any transforms? can you talk about the residual analysis for this data set? what about interaction? thank you
@chrishanni2779
@chrishanni2779 5 жыл бұрын
Oh man, forgot, why factor for the ordered data?
@jarrydscully8430
@jarrydscully8430 2 жыл бұрын
Great video! I have one question: Will R automatically account for issues regarding the dummy variable trap? It looks like the summarized model automatically included (m-1) factors for the categorical variables.
@statquest
@statquest 2 жыл бұрын
If the categorical variables are cast as "factors", then R will create the correct design matrix.
@Ma-er8fd
@Ma-er8fd Жыл бұрын
Thank you so much for sharing ,wanna ask a question, the variable "hd" is needed to change into factors? I saw some codes,some did, some didnt, and I tried, the two result is same, because some machine learning method require factors,so I am little confused, and if don t mind, the variable of results is all need to change into factors or depending on the circumstances?really appreciate your help and looking forward to your answer. Thanks again
@statquest
@statquest Жыл бұрын
In this example we convert 'hd' into a factor because it represents 4 possible outcomes (which we then reduce to just 2 possible outcomes). We have to do this because the outcomes are coded as numbers, which R original interprets as integers, rather than discrete outcomes.
@jives.
@jives. 3 жыл бұрын
lets goooo StatQuest
@statquest
@statquest 3 жыл бұрын
bam!
@federicogarland2706
@federicogarland2706 9 ай бұрын
Thank you very much for this! One question, is the calculation of the Pseudo R2 the same for a Poisson generalized linear model in R?
@statquest
@statquest 9 ай бұрын
Unfortunately I've never done Poisson regression... :(
@philippaknecht9247
@philippaknecht9247 5 жыл бұрын
Hi Josh I find your videos very informative and they help me a lot with my bachelors thesis. Because you put some variables into "factors" and others stay "numeric" I think I can ask my question, that I nowhere find an answer on the internet, or I don't know how! I do a logistic regression with NBA regular season games to find out if the fact that the teams are eliminated from the playoffs has an effect on their winning probability (to find out if they "tank" = intentionally loosing). For the variable of the current strength of the team I use the current winning percentage of the team (how many games won over how many games playd) and this variable is refreshed after every game. I was wondering if I can put this variable as a "numeric"? Or as what kind of type would you define this winning percentage? The opponents winning percentage, whether the game is on the home court or not, if the team is statistically eliminated or in the playoffs and if the opponent is statistically eliminated or in the playoffs is also in the regression. It is the same regression some reserachers did back in 2002 to test the same thing but no one did recently. I hope you understand my question and hope very much, that you can and are willing to help me. Thank you very much and have a great day!
@statquest
@statquest 5 жыл бұрын
For logistic regression, it will be easier to understand what the estimated coefficients mean if you multiply the percentage of games won by 100. When you do this, you can use these values as "numeric" and the coefficient will tell you how much the probability of the outcome changes for every 1 percentage change in that variable. For more details on interpreting the coefficients, check out kzfaq.info/get/bejne/rLRllrF_l5Osh3k.html
@philippaknecht9247
@philippaknecht9247 5 жыл бұрын
Thank you very much for your help!!! I appreciate it a lot! I'm glad it's not a complicated solution... :D
@Jana-ed6xf
@Jana-ed6xf 4 жыл бұрын
Hi, first of all: I love your videos! However, do I have to change the predictors from num to factors? Or can I go with the numeric values as well?
@statquest
@statquest 4 жыл бұрын
In this video, we use a combination of "num" and "factors", so you can use both types.
@dchristiadi85
@dchristiadi85 4 жыл бұрын
Hi Josh, Firstly, forgive my ignorance. Can I refer to null and proposed LL results in 14:12? You mentioned that to pull from the log-likelihood, we need to divide the scores by -2. Can you please elaborate how do you get the -2? Additionally, in 14:46 do you use 1-pchisq to get the upper tail? If my assumption is wrong, can you please explain the 1-pchisq part? Thank you heaps
@statquest
@statquest 4 жыл бұрын
Your first question is answered in my video on Saturated Models and Deviance: kzfaq.info/get/bejne/b7pgqs98ycvbZn0.html For your second question, the answer is "you are correct!". :)
@CarlosDullius
@CarlosDullius 5 жыл бұрын
I really love the music kkkk Congrats man, you are amazing o/
@statquest
@statquest 5 жыл бұрын
Hooray! :)
@leoniepfeifer4091
@leoniepfeifer4091 10 ай бұрын
Hi! thanks for your video!! do you know how to add fixed effects (year and sector) to the logistic regression model? thank you so much in advance!!
@statquest
@statquest 10 ай бұрын
Unfortunately, not off the top of my head. :(
@dvijeniya
@dvijeniya 5 жыл бұрын
Thanks for the detailed and super easy explanation, Josh. I'd like to ask you, shouldn't we check the below items before regression? 1. Would it be better if we use the WOE or log(odd) of a variable rather than raw variable (for example gender -> dummy). If I'm not wrong to use a dummy variable in the model is not a good choice; 2. Correlation between variables; 3. Factor transformation ; 4. PCA analysis. And as a result in order to calculate the probability from log(odds), we should use Sigmoid function? I mean transform log(odds) to the probability Thanks in advance!
@statquest
@statquest 5 жыл бұрын
If you want your model to be interpretable - in that you can look at the parameter value and make sense out of them - then removing correlated variables is a good idea and factor transformation and PCA can help with that. On the other hand, if you want to use your model to make the best predictions, then correlated variables are fine and can improve predictions. If you are interested in the details behind Logistic Regression, then check out these other StatQuests: General Overview: kzfaq.info/get/bejne/r6-JfrVl2M3eeWw.html Interpreting Coefficients: kzfaq.info/get/bejne/rLRllrF_l5Osh3k.html Fitting the Model to Data with Maximum Likelihood: kzfaq.info/get/bejne/eMx7lNGdlse3d2Q.html Calculating R-squared and its p-value: kzfaq.info/get/bejne/rt52jNWgnbfZiHU.html
@yeong5haeng
@yeong5haeng 3 жыл бұрын
Josh, for multinomial logistic regression, do we need to i) run a simple model for each predictor first, then only run the fancy model for all predictors? or we can just ii) straight away skip to the fancy model? also, what are the differences in both analyses output?
@statquest
@statquest 3 жыл бұрын
I'm not sure. I've never done multinomial logistic regression.
@mrangelepic1
@mrangelepic1 5 жыл бұрын
Hi Josh, Thank you very much for this great Video! :) Could you please do a video on how AIC works and how to select the relevant parameters for the logistic regression model out of the parametes that are given in a data table?
@statquest
@statquest 5 жыл бұрын
AIC is on the to-do list. Since asked for it, I'll bump it up a little closer to the top.
@565-FENRIR
@565-FENRIR 5 жыл бұрын
Double BAM!!! That's sounds great! Awesome video, all of them are so helpful to understand logistic regression!
@statquest
@statquest 5 жыл бұрын
@@565-FENRIR Hooray! Thank you! :)
@MB-nc9rq
@MB-nc9rq 3 жыл бұрын
Great video, thanks so much Josh! After the 4th minute you mention how to address the NA samples. Can you teach us the RANDOM FOREST method, if we don't want to get rid of our NA samples (e.g. in multivariate cases, where the rows include other useful info)? Thanks!
@statquest
@statquest 3 жыл бұрын
I cover the random forest method in this video: kzfaq.info/get/bejne/bKuIg7yrx8ywc3k.html (the theory is here: kzfaq.info/get/bejne/qbdoapOSubHVmYE.html )
@nishachauhan8081
@nishachauhan8081 5 жыл бұрын
And at 6:23 how that contribute to best fitting line?
Logistic Regression Details Pt1: Coefficients
19:02
StatQuest with Josh Starmer
Рет қаралды 885 М.
How to interpret (and assess!) a GLM in R
17:36
Chloe Fouilloux
Рет қаралды 25 М.
Alat Seru Penolong untuk Mimpi Indah Bayi!
00:31
Let's GLOW! Indonesian
Рет қаралды 15 МЛН
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 12 МЛН
Мы никогда не были так напуганы!
00:15
Аришнев
Рет қаралды 5 МЛН
LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION
00:15
Regression Trees, Clearly Explained!!!
22:33
StatQuest with Josh Starmer
Рет қаралды 617 М.
Simple Linear Regression.
16:38
R Programming 101
Рет қаралды 4,2 М.
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Linear Regression, Clearly Explained!!!
27:27
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
Logistic Regression with R: Categorical Response Variable at Two Levels (2018)
19:47
Regression Analysis | Full Course
45:17
DATAtab
Рет қаралды 781 М.
Linear Regression, Clearly Explained!!!
27:27
StatQuest with Josh Starmer
Рет қаралды 221 М.
Learn Statistical Regression in 40 mins! My best video ever. Legit.
40:25
Logistic Regression [Simply explained]
14:22
DATAtab
Рет қаралды 162 М.
Alat Seru Penolong untuk Mimpi Indah Bayi!
00:31
Let's GLOW! Indonesian
Рет қаралды 15 МЛН