Regression diagnostics and analysis workflow

  Рет қаралды 14,985

Mikko Rönkkö

Mikko Rönkkö

Күн бұрын

The video provides a comprehensive overview of a workflow for regression analysis, emphasizing the importance of addressing empirically testable assumptions post-analysis. It begins with formulating a hypothesis, followed by data collection and exploration to understand relationships. An initial regression model is then estimated, involving independent and dependent variables, and its results are briefly reviewed. The focus then shifts to diagnostics, favoring plots over statistical tests for a more informative view of issues like heteroskedasticity.
In the diagnostic phase, the video demonstrates the use of various plots, starting with the normal Q-Q plot to assess the distribution of residuals and identify outliers. This is followed by the residuals versus fitted plot to detect nonlinearity and heteroskedasticity in the data. The leverage versus residual squared plot helps identify influential observations. The added-variable plot is then used to examine the relationship between the dependent variable and each independent variable, isolating their unique contributions. Based on these diagnostics, adjustments are made to the regression model, such as addressing nonlinearity or heteroskedasticity, and retesting until a satisfactory model is achieved. The video concludes with the interpretation of regression coefficients in the context of the research, using the prestige dataset with 'prestige' as the dependent variable and 'education', 'income', and 'share of women' as independent variables.
Slides: osf.io/6agb4

Пікірлер: 34
@BrinderSadler
@BrinderSadler Ай бұрын
A very informative video that is clear and uses examples so that viewers can better follow. Thank you.
@mronkko
@mronkko Ай бұрын
You are welcome!
@newtonocharimenyenya2458
@newtonocharimenyenya2458 3 жыл бұрын
A Great Piece. Simple to understand.
@mronkko
@mronkko 3 жыл бұрын
Glad you think so!
@THEPSYCHOTIC
@THEPSYCHOTIC 4 ай бұрын
I have been trying to find workflow videos on regression analysis for a while now, this is the first (and only one) that I found. It helped me immensely, thank you.
@mronkko
@mronkko 4 ай бұрын
You are welcome. It is surprising that very few people teach how to actually use the analyses in empirical research practice.
@THEPSYCHOTIC
@THEPSYCHOTIC 4 ай бұрын
@@mronkko that's true. Most videos cover only interpretation of results or are focused on let's say one part of the analysis but no one covers the whole process in a single video, with a single dataset. Just an idea - You could maybe consider doing a workflow series focusing on how to do analysis with different combinations of explanatory/response variables? Let's say one categorical explanatory variable, 1 exp and 1 quantitative, 2 categorical exp var, and so on. And the same logic with explanatory - quantitative vs qualitative. I'm not sure if you've done it already, but it'd be so so helpful! Thanks again, keep up the good work. I wish you good luck!
@magnusjensen5867
@magnusjensen5867 3 жыл бұрын
Best explabation I’ve come across on KZfaq! Keep up the good work
@mronkko
@mronkko 3 жыл бұрын
Glad it helped!
@whx2044
@whx2044 3 жыл бұрын
Thank you for teaching !
@mronkko
@mronkko 3 жыл бұрын
You are welcome.
@bezaeshetu5454
@bezaeshetu5454 2 жыл бұрын
Thank you for the nice and clear explanation.
@mronkko
@mronkko 2 жыл бұрын
You are welcome!
@rutwikkadane2409
@rutwikkadane2409 3 жыл бұрын
Thanks for the explanation!
@mronkko
@mronkko 3 жыл бұрын
Glad it was helpful!
@newtonocharimenyenya2458
@newtonocharimenyenya2458 3 жыл бұрын
A very Great piece.
@mronkko
@mronkko 3 жыл бұрын
Thanks
@faemillongo6839
@faemillongo6839 2 жыл бұрын
Thanks. So clear
@mronkko
@mronkko 2 жыл бұрын
Happy that you find it helpful. The lack of reporting that regression diagnostics were done is a big problem in published research. And this would be so easy to fix. Pay attention to your model assumptions and justify them.
@Youtuube304s
@Youtuube304s 3 ай бұрын
Subscribed. Very good
@mronkko
@mronkko 2 ай бұрын
You are welcome.
@harijha6279
@harijha6279 Жыл бұрын
best explanation
@mronkko
@mronkko Жыл бұрын
Good that you liked it!
@statistikochspss-hjalpen8335
@statistikochspss-hjalpen8335 Жыл бұрын
Great video. My question is what to do when ln transformation doesn't help? Imagine a regression with only Likert scale variables (1-5). Customer satisfaction as the dependent variable and product quality, customer service as independent variables. Most customers score 4 or 5 on the all variables. Almost all of the MLR assumptions are not met. How to approach the problem? I read about PLS being an alternative instead of OLS, but my coefficients are almost identical with both OLS and PLS (don't know if it's because of a fairly big dataset, n=8000).
@mronkko
@mronkko Жыл бұрын
If your scales are poorly calibrated so that you get just 4s and 5s in a 1-5 scale, then I do not think that there is anything that you can do except to collect better data. How to approach the "allmost all assumptions are not met": I would start by looking at a specific assumption first and what you can do about it. For example, if the relationships are not linear, then I would start thinking about using nonlinear functional forms.
@statistikochspss-hjalpen8335
@statistikochspss-hjalpen8335 Жыл бұрын
@@mronkko Thank you for taking the time to respond. The data is real and based on real customers. The satisfaction metric (dependent variable) is already well established in the industry. If I'm interpreting my normal probability (y axis shows percent and x axis shows residual) plot it looks like 7% of the observations are off the line. The residuals go from minus 10 to positive 5. The residual vs fits, the residuals slope downwards as the fitted value increases.
@mronkko
@mronkko Жыл бұрын
@@statistikochspss-hjalpen8335 If the residual slopes downward, then you might have nonlinearity and you need to consider other functional forms. The fact that a measure is well-established does not necessarily mean that the data are good. For example if you want to assess the effect on persons height on persons weight, but only measure people between 180 and 181 cm, then normal measurement tape would not suffice because it is not precise enough. The same can happen in your data, if you have little variation in satisfaction you might need a measure that is calibrated differently. I think I talk about measurement calibration in one of the measurement presentations, but I am not 100% sure about that.
@zwan1886
@zwan1886 2 жыл бұрын
In your AV plots around 15:00 isn't it showing that the women regressor doesn't add anything to the model?
@mronkko
@mronkko 2 жыл бұрын
Yes. that is what the model shows. Also he regression coefficient in the table at 2:58 shows that the effect of women is nonsignificant.
@kar2194
@kar2194 2 жыл бұрын
Hi Thanks for the content! 3:09, you said you have a video of the regression coefficient, I can't find it, I would like to check it out :)
@mronkko
@mronkko 2 жыл бұрын
Good question. The videos are from a course that I run and I have organized them as KZfaq playlists. This video is from the third study unit and the video that I refer to is from the second unit: kzfaq.info/get/bejne/obF1ZJCarM_dp58.html
@kar2194
@kar2194 2 жыл бұрын
@@mronkko Thanks!
@auddssey
@auddssey Жыл бұрын
i want to see the r code for residual vs leverage plot, how the occupation outliers appear :-)
@mronkko
@mronkko Жыл бұрын
The slides are linked in the video description and contain some R code in the slide notes library(car) data(Prestige) reg1
Added variable plot or partial regression plot
8:50
Mikko Rönkkö
Рет қаралды 8 М.
Learn Statistical Regression in 40 mins! My best video ever. Legit.
40:25
Slow motion boy #shorts by Tsuriki Show
00:14
Tsuriki Show
Рет қаралды 10 МЛН
I'm Excited To see If Kelly Can Meet This Challenge!
00:16
Mini Katana
Рет қаралды 29 МЛН
Difference-in-differences methods
16:18
Mikko Rönkkö
Рет қаралды 43 М.
Stata Regression Diagnostics
19:55
MyPodClass
Рет қаралды 1,6 М.
Multivariable Linear Regression in R: Everything You Need to Know!
20:48
yuzaR Data Science
Рет қаралды 4,9 М.
Regression assumptions explained!
47:16
zedstatistics
Рет қаралды 248 М.
Regression with Count Data: Poisson and Negative Binomial
19:36
Matthew E. Clapham
Рет қаралды 58 М.
Correlation and Regression Analysis: Learn Everything With Examples
9:50
LEARN & APPLY : Lean and Six Sigma
Рет қаралды 1,5 МЛН
Simple Linear Regression:  Checking Assumptions with Residual Plots
8:04