R demo | Mann-Whitney U Test = Wilcoxon Rank Sum Test | How to conduct, visualise, interpret & more😉

  Рет қаралды 3,080

yuzaR Data Science

yuzaR Data Science

Күн бұрын

The one simple command I promised in the video is:
ggbetweenstats(
data = d,
x = jobclass,
y = wage,
type = "nonparametric")
If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳

Пікірлер: 42
@bartoszkedziora3256
@bartoszkedziora3256 Жыл бұрын
You are the best you can find on youtube! Thank you so much
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Thanks 🙏 glad you enjoyed my content!
@117chris9
@117chris9 Жыл бұрын
Brilliant thank you so much
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Thanks 🙏 if you liked this one, you might like the package reviews, gtsummary, for example is one of the most useful
@ismailabdelli7287
@ismailabdelli7287 Жыл бұрын
thank you so much! that was really a helpful and accurate explanation
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Glad it was helpful!
@angvl8793
@angvl8793 2 жыл бұрын
Great video! Thank you very much!
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
You are welcome! I am glad you enjoyed it
@martinglhf
@martinglhf 2 жыл бұрын
Very well explained, thanks!
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
Glad you enjoyed it! 😊
@syhusada1130
@syhusada1130 2 жыл бұрын
Amazing
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
Thanks! Glad you liked it!
@so4ragb
@so4ragb Жыл бұрын
Dear Yuri, Thanks for your great videos, which I have been following and recommending my fellow physicians. These are so great ! Please consider making some tuts on univariable and multivariable analyses on oncology. With independent parameters like Age, cancer stage, treatment, baseline lab values, ECOG scores, etc and outcomes like time to event, death or not death. That would be great !
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Dear so4ragb, thank you very much for your feedback! And thanks for the suggestion. Interestingly, I am already in process of making a video about a cool package for quick uni- and multivariate analyses in med area ... although statistics is truly agnostic. So, please, stay tuned ;)
@so4ragb
@so4ragb Жыл бұрын
@@yuzaR-Data-Science that's a fantastic news. Very much looking forward to watching it. Hoping for more clinical stats 😉. Thanks for all your efforts.
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
You are very welcome!
@luisa1551
@luisa1551 3 ай бұрын
Thanks for the video! I have a question: The ranking makes it obsolete to know the distribution. However, how would you approach the same problem under the new Generalized Linear Models as base? For what I understand, all previous hypothesis testing tests can be done by Generalized Linear Models or Lineal Mixed Models. For GLMs, I would need a link function, but how do I decide which? I am not sure what the advantage of the ranking will be apart of getting around the assumptions of normality.
@yuzaR-Data-Science
@yuzaR-Data-Science 3 ай бұрын
Dear Luisa, to answer what link function to choose would need the whole new video and I am planning to make one in the future. While the ranking resolves normality and heterogenety of variances, I am not a big fan of ranking, because it kills the real data we have measured. It was just important to describe, so that people dont think that they compare medians. Median, by the way, is the better choice to address many problems in the data, so that I would recommend to dive into quantile regression first, before getting to link functions. I have two videos on Quantile Regression on the channel, so, feel free to check them out. Cheers!
@Alex-gw6pm
@Alex-gw6pm 5 ай бұрын
Thank you so much! can you tell me please, I have 4 animal groups, in each group there are 5 animals. the groups are: 1- group of intact animals, 2- group which exposed to first factor, 3- group which exposed to second different factor and 4- control group without exposure to the second factor. I'm interested in comparing between the 3rd and 4th groups, in same time i want to compare 4th group with 1st group. In your opininon which test i should choose, Mann-whitney to compare firstly 4th group with 1st group and then 3rd with 4th group, or Kruskal-Wallis to compare all the groups together. I just tried the both test, Kruskal-Wallis gives me no differences while Mann-whitney gives. I guess the reults of Mann-Whitney more trustful but I am not sure so i decided to ask you as a statistician. P.S. I didn't apply any correction method for mann-whitney
@yuzaR-Data-Science
@yuzaR-Data-Science 5 ай бұрын
Hi Alex, the short answer: ggbetweenstats(mtcars, x = cyl, y = mpg, p.adjust.method = "none", pairwise.display = "all"). The longer answer is: you have to correct for multiple comparisons! Or at least explicitely state it in your paper. I have a video on kruskal wallis on my channel, in case you still did not discover it. Hope that helps!
@alelust7170
@alelust7170 Жыл бұрын
Very well explained! I had a doubt: In your example, the two groups have the same size of observations (15). Can I play in groups of different sizes with the same video parameter? Tks
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Sure, since MWU test is for independent samples, it does not matter how many observations every sample has. For dependent samples, Wilkoxon test, it does. Thanks for the feedback and thanks for watching!
@ednacossa8863
@ednacossa8863 Жыл бұрын
Hello @Yuzar, thank you for sharing all this knowledge. I"m working on some datas about changes in soil organic carbon after conversion of forest into agriculture. Those data were collected in diference depth (fives depth), besides doing a plot between forest and agriculture in each depth. Is there anyway using this package (ggbetweenstats) that I can plot all the depth into the same plot and see the changes among the groups?
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
suer, you either use grouped_ggbetweenstats to produce subplots for different depth, or you can put all the depth into one column and determine the order of categories on the x-axis via "factor" and "levels" and then put the variable on the x-axis. then you'll be able to get post-hoc tests
@ednacossa8863
@ednacossa8863 Жыл бұрын
@@yuzaR-Data-Science Thank you. I'm gonna do that.
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
@@ednacossa8863 you are welcome!
@user-cm3zx8vr7e
@user-cm3zx8vr7e 8 ай бұрын
great explanation!! However I am getting error while using ggstatsplot function. Can you please suggest an alternative here or recommendation of solving this error?
@yuzaR-Data-Science
@yuzaR-Data-Science 8 ай бұрын
sure, since ggstatsplot works on top of other languages, there might be discrepancies between packages. so, update R, then update RStudio, then update all the packages. then if you still get the error message, just read it carefully, there is may be one package missing, check whether you data is in a right format, or just google the error message, there were tons of folk, who hat it before too, and most of them are already solved. cheers
@moviezone8130
@moviezone8130 2 ай бұрын
Thanks for the wonderful explanation. As I said before you set the bar high. By the way I want to move to data science, I have a bachelor's degree in Chemistry and Master's degree in Environmental Science from Addis Ababa University. I have started learning data science with R programing software for the last 6 months. What best can you advice me. Obviously I live in Ethiopia so I can't take online course because we don't have the international bank payment system so I depend on KZfaq and reading books that are freely available. Data science really excites me a lot. Thanks.
@yuzaR-Data-Science
@yuzaR-Data-Science 2 ай бұрын
Hi man, glad to hear that you are excited for data science! The good news is, with internet you can learn anything! There are more than enough books and free resources to learn about data science and R or python! Please, don't pay for courses, they are usually crap. KZfaq, blogs and free books will be enough. If you want to really learn R, here are some free books: R4DS, Tidy Modeling with R and ISLR. If you focus on those (+ some practice and real work + learning from other ressourses) you'll be a better data scientist in a year then 90% of those who finished a fancy university. So, keep up the learning energy and I hope my youtube channel helps you on the way there! Cheers
@moviezone8130
@moviezone8130 2 ай бұрын
Thanks for your prompt reply. I will do as you said, I'm into R first so I will stay in it for a while to master it. I will also stay in touch with your channel. I am on LinkedIn so we can be friends there too. Thanks.
@yuzaR-Data-Science
@yuzaR-Data-Science 2 ай бұрын
sure, just send me the invite ;)
@syhusada1130
@syhusada1130 2 жыл бұрын
The lowest p-value in one of the group I want to test is 0.06, is it low enough to be called not normally distributed?
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
I would still go with normal distribution. If not sure, you can use plot_density() or ggqqplot() for this group and visually test for normality, when it is aproximately (nobody knows what approximately means ;) everyone decided for himself) normla, use a parametric test
@syhusada1130
@syhusada1130 2 жыл бұрын
@@yuzaR-Data-Science Okay, so I did use ggqqplot, and the data sits in the grey color area, so, they're normally distributed?
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
Yes
@syhusada1130
@syhusada1130 2 жыл бұрын
@@yuzaR-Data-Science Okay, I guess I'll use Welch t-Test, since the variance are not equal.
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
yes, and this is pretty sure, no guessing ;) the two tests (Shapiro and Levene's) are useful, because they help you to decide which final test to take.
@so4ragb
@so4ragb 2 жыл бұрын
Thank you. When I rung ggbetweenstats, I get following error msg. Any idea where the problem lies ?: Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3 > rlang::last_error() Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3 --- Backtrace: 1. ggstatsplot::ggbetweenstats(...) 15. statsExpressions:::.prettyNum(n) 16. base::prettyNum(x, big.mark = ",", scientific = FALSE) 17. base::vapply(...) Run `rlang::last_trace()` to see the full context. > rlang::last_trace() Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
Hey, try to update all packages 📦 that should solve it
@YasinNabi
@YasinNabi 2 жыл бұрын
This is a wonderful and interesting channel. I found it very useful. worth subbing and liked ! a fellow creator,,,,
@yuzaR-Data-Science
@yuzaR-Data-Science 2 жыл бұрын
Thanks, Yasin! Appreciate your feedback!
R package reviews | dlookr | diagnose, explore and repair your data quick!
17:13
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 117 МЛН
Master Simple Linear Regression with Numeric Predictor in R
12:31
yuzaR Data Science
Рет қаралды 1,8 М.
Wilcoxon Rank-Sum Test
10:29
Anthony Lapuz
Рет қаралды 27 М.
Transform Your Data Like a Pro with {tidyr} and Say Goodbye to Messy Data!
13:17
Non-parametric tests - Sign test, Wilcoxon signed rank, Mann-Whitney
28:36
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 852 М.
Double Stacked Pizza @Lionfield @ChefRush
00:33
albert_cancook
Рет қаралды 117 МЛН