Пікірлер
@statsguidetree
@statsguidetree Күн бұрын
The rcode to load and clean the dataset and to run the analysis is available on my github: gist.github.com/musa5237/2e41fa4ec8fe36b34d374e0879523754 Some additional updates to the video: 03:06 -- To load a csv into your environment, the code should be read.csv(file.choose(), sep= ‘,’, header=T) 03:31 -- Type.1 is the pokemon type (e.g., water, fire, etc.) not the name. I left the name out of the dataset with select(). 29:43 -- I should clarify that the mean decrease in accuracy and mean decrease in gini agreed only on which feature is most important. 41:57 -- The hash notes should read default arguments for regression not classification.
@lanredaodu945
@lanredaodu945 11 күн бұрын
excellent tutorial i watched 3x
@francyy-ug1qr
@francyy-ug1qr 2 ай бұрын
thank you sm!!
@littleheavenonearth86
@littleheavenonearth86 2 ай бұрын
very informative. Can I use this MIRT for an instrument with more than 2 dimensions? I am working on validating a research instrument that is polytomous with 4 dimensions but 1 latent trait.
@jjpp2925
@jjpp2925 2 ай бұрын
Thank you very much for the video. I have a question regarding the ability scale/x-axis in the plots. It is always ranging from -4 to 4. Is there a possibility to change resp. rescale it, e.g., to -3 to 3? So that all values (also the extrimity parameters and the discrimination parameters) correspond to the new scale (-3 to 3).
@statsguidetree
@statsguidetree 2 ай бұрын
Hmm. I haven't tried adjusting the horizontal/x scale for this specfic plot. But generally adding xlim argument in the plot() function could work. E.g., plot(mod2, xlim=c(-3,3)). Hopefully that will work.
@user-qd3lz1wu2m
@user-qd3lz1wu2m 4 ай бұрын
Thank you for your great video on DIF with ‘mirt’. I managed to do DIF analysis with your detailed instructions. I have a few questions, though. I’ll be very thankful if you answer them. 1. When we examine DIF, we are basically interested in the item difficulties across the two groups. I don’t follow when the manual says, “Determine whether it's a1 or d parameter causing DIF”. How can the slope parameter cause DIF? Isn’t DIF a property of the intercept? 2. What is the ‘RMSD_DIF’ function? How is it used and interpreted? 3. I’m working with large scale international educational data where around 50 countries take part. I need to examine country DIF. I know that in this context we don’t do pairwise comparisons, but we estimate a pooled international ICC (all countries combined) and then compare the country ICCs with the pooled international ICC. Can you please provide the codes for this? Thanks for your help in advance. Afshin
@user-vw2qu7wh7d
@user-vw2qu7wh7d 5 ай бұрын
Would it be correct to say that while the distractors should have 0 in at least one of the three attributes, the key (correct option) should have mastery of all three attributes? I don't think that the key should have 0s in any attributes.
@user-cl6qi3om8x
@user-cl6qi3om8x 5 ай бұрын
I've got nominal data (dependent variable outcome of 0/1/2], how do you run LOOCV on multinom model? Any help is appreciated.
@user-cl6qi3om8x
@user-cl6qi3om8x 5 ай бұрын
I noticed the [method = "glm" ] was used in the LOOCV method, but what if you have a nominal dependent variable [outcome of 0/1/2], how can we run LOOCV on that? Any help is appreciated.
@user-dm2xg1ue2m
@user-dm2xg1ue2m 6 ай бұрын
Very informative video. I am trying to train an RF model where I have 40+ independent variables. I am currently using k-fold CV with 3 repeats. It is taking a lot of time. How can I reduce the model training time? I am afraid if I will use bootstrap method, it may take even longer time.. 2-3days!! Any suggestions??
@mazizimnoor5563
@mazizimnoor5563 6 ай бұрын
how about for real data how to plug in real data into the command as q matrix
@hasanhash12
@hasanhash12 6 ай бұрын
Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?
@hasanhash12
@hasanhash12 6 ай бұрын
and also at line 136 #no psa, just regression if i run mod_test1<- glm(selective ~ MEDIAN_HH_INC+ STEM+ PCTPELL+ UG25ABV, data = coll2, family="binomial"), it says Error in model.matrix.default(mt, mf, contrasts) : variable 1 has no levels
@hasanhash12
@hasanhash12 6 ай бұрын
I suppose problem is here at line 22: coll<-subset(coll, PREDDEG == 3 & MAIN == 1 & (CCUGPROF==5|CCUGPROF==6|CCUGPROF==7| CCUGPROF==8 |CCUGPROF==9 |CCUGPROF==10 | CCUGPROF==11 |CCUGPROF==12 | CCUGPROF==13 |CCUGPROF==14 |CCUGPROF==15), select=c(selective, MD_EARN_WNE_P10, MEDIAN_HH_INC, STEM, PCTPELL, UG25ABV)) after running this code, it gives null values for coll. please if you can guide how can this be fixed?
@thomaspgumpel8543
@thomaspgumpel8543 7 ай бұрын
this is such a terrific video. thanks. I wish that you would have more similar pieces. I have 2 questions: a) I am trying to examine DIF between 4 latent classes. Can the command: sex_a<-subset(sex_a,(group ==1 | group == 2| group == 3| group == 4), select = c(A1:A11,group)) be followed by: plot(genDIF, labels = c('1', '2', '3', '4')) to give me 4 groups plotted for each item. It seems to limit me to 2 groups (as in your example). b) genDIF <- lordif(sex_a[,1:11],sex_a[,12], criterion = 'Chisqr', alpha = 0.01): Error in collapse(resp.data[, selection[i]], group, minCell) : items must have at least two valid response categories with 5 or more cases. I know that there are more than 5 cases per group. Thanks again
@sanjanakhondaker887
@sanjanakhondaker887 7 ай бұрын
What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.
@srijansengupta6389
@srijansengupta6389 8 ай бұрын
genDIF <- lordif(IPC[,1:7],IPC[,'Gender'], criterion ='Chisqr', alpha = 0.01) Iteration: 34, Log-Lik: -5580.797, Max-Change: 0.00009 (mirt) | Iteration: 1, 5 items flagged for DIF (1,2,4,6,7) Error in `vec_equal()`: ! Can't combine `..1` <character> and `..2` <double>. Run `rlang::last_trace()` to see where the error occurred. this happened after running this code can you help?
@statsguidetree
@statsguidetree 7 ай бұрын
You may need to change the gender field to a different data type. Is the gender field in your dataset a character? If so try converting it to a factor or numeric.
@manishadinesh2797
@manishadinesh2797 8 ай бұрын
can you help me interpret the interaction term logit(DEATH_EVENT)=−1.698+0.0385×age+0.8267×serum_creatinine−0.0006520×ejection_fraction×time
@statsguidetree
@statsguidetree 8 ай бұрын
Generally the interaction term would be defined as the effect ejection fraction has on death is conditional on values of time controlling for the other variables in the model. When you include interactions, it is often also a good idea to include the main effect of each variable also in the model. In addition, to make it easier to interpret you can center each variable before multiplying them together to form the interaction. Here is a good resource for working with interactions that go into more detail: www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=www3.nd.edu/~rwilliam/stats2/l55.pdf&ved=2ahUKEwjGvaaX5vSCAxWgmIkEHah6AfcQFnoECCUQAQ&usg=AOvVaw3KaKU8apAO-VaPq4RXqmYS
@jolima2045
@jolima2045 8 ай бұрын
please how to do step with glmer? Is there a package?
@user-iq2qr8lb2y
@user-iq2qr8lb2y 8 ай бұрын
I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.
@statsguidetree
@statsguidetree 8 ай бұрын
So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.
@alexwisniewski7105
@alexwisniewski7105 9 ай бұрын
Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?
@statsguidetree
@statsguidetree 9 ай бұрын
This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.
@goodboy-je2kz
@goodboy-je2kz 10 ай бұрын
What about SEM ? Can we do it in this package ????
@statsguidetree
@statsguidetree 9 ай бұрын
I am not too familiar with that package. There may not be a specific function for irt models but it may have a more round about way of estimating irt parameters.
@statsguidetree
@statsguidetree 11 ай бұрын
I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724
@muhammedhadedy4570
@muhammedhadedy4570 Жыл бұрын
I've watched many tutorials explaining propensity score matching on KZfaq, and I can tell that this video is the best I've ever seen. Well done, sir. You helped me a lot. ❤❤❤❤
@skcjdtn
@skcjdtn Жыл бұрын
It was super helpful! Truly appreciate your clear demonstration and explanation. I have a quick question: if there are missing data in responses, how CDM package impute these missing data? Looking forward to hearing from you!
@katieweir4166
@katieweir4166 Жыл бұрын
The data doesnt work anymore!
@statsguidetree
@statsguidetree 11 ай бұрын
My apology for the delayed response, you can use the following code to load it into r: coll<-read.csv("gist.githubusercontent.com/musa5237/78a694bd6663a92a82e45e684e616724/raw/132430c291f72fc20a7df0ba951e9ce6a77e4902/Most%2520Recent%2520Cohorts%2520All%2520Data%2520Elements",sep=",", header=T)
@metalslegend
@metalslegend Жыл бұрын
when you type IRTpars = TRUE, simplify = TRUE in the coef() command, you will receive the actual threshold parameters b. The intercepts cannot be interpreted properly.
@chaimatibajjate3751
@chaimatibajjate3751 Жыл бұрын
Thank you for the video. Please, I have a question about "factor.scores()", it doesn't work for me because I receive an error of a missing argument "f". Do you have any solutions to solve this problem ?
@torenorrne1883
@torenorrne1883 Жыл бұрын
Thank you very much for a great video! What is the link between the MH chi square statistic and the effect size (A-C)? In my dataset the item with the largest MH-chi square statistic (629,49) has an effect size in the category "B", while another item has a MH-chi square statistic of 56.14 and is in the effect size category "C". I hope you can shed light on this mystery for me :)
@user-kx5sd8tz2r
@user-kx5sd8tz2r Жыл бұрын
Amazing videos! Just a question: I have a questionnaire with a 5likert scale. Shall I use the Rasch or GRM model?
@statsguidetree
@statsguidetree Жыл бұрын
For likert scale data you generally want to use a GRM.
@manonkinaupenne2090
@manonkinaupenne2090 Жыл бұрын
Thank you very much for this clear explanation! I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups. thank you!
@statsguidetree
@statsguidetree 11 ай бұрын
Yes, PSM should always work when you have a control group.
@andrej.mentel
@andrej.mentel Жыл бұрын
Thank you for a lecture. I would like to ask some additional questions: 1. Model fit - do you recommend to use M2 statistics and global fit indices such as CFI, TLI, RMSEA etc. as a measure of model appropriateness? Or for a model comparison? 2. What measures do you recommend for the decision whether to use a bifactor or a two-dimensional model? 3. How can you in the bfactor function specify the other polytomous IRT models? For example, generalized partial credit model - does it allow the argument itemtype="gpcm" such as in the case of mirt function? Thank you very much!
@Desocupad0
@Desocupad0 Жыл бұрын
"uh" (the video is good you need to train to avoiding repeating that 'word')
@festusattah8612
@festusattah8612 Жыл бұрын
great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation
@statsguidetree
@statsguidetree Жыл бұрын
You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.
@Adelphos0101
@Adelphos0101 Жыл бұрын
Very helpful, especially the logistic regression section. Thank you.
@namelessone5022
@namelessone5022 Жыл бұрын
You have no idea how much you helped me with mirt. I used to play with other irt packages but this one is much more complex than others and often I get error messages that I don't when using other packages. Great thanks to you again, and for including bifactor model example (which I'm studying too). Rewatching your video, I noticed something. You added more d parameters than I typically found in other examples using mirt. You have d1-d4, which I assume is because of your dependent var having 5 categories? On this topic too, I would like to confirm one thing. Because in one psychological paper, I have seen the author testing for DIF using only which.par('c=a1') but not using 'd'. It wasn't explained so I wonder what's the difference? I believe 'a1' tests for only non-uniform DIF, while 'd' tests for uniform DIF, and 'a1'+'d' tests for both types of DIFs. If that's the case, would it be best to test for both types of DIFs by having 'a1'+'d'? I generally see no reason why you would test for just one type of DIF? what's your take on this?
@elloisejackson2398
@elloisejackson2398 Жыл бұрын
Why do we suppress it to 0.25 when looking at the factors?
@andrej.mentel
@andrej.mentel Жыл бұрын
It's more or less just a convention for the sake of clarity of results. Personally, I prefer the 0.3 cut-off. This can be interpreted to mean that a factor with a loading of < 0.25 on a particular item can be considered a "factor of negligible influence". However, remember that this is an exploratory analysis, so we use this to estimate what the possible structure might be. In the confirmatory case, we explicitly assume that (if the particular item is an indicator of just one factor) the factor loading of the other factor is zero.
@lucyh1208
@lucyh1208 Жыл бұрын
Thank you for this! Still I am unsure, which of these omegas should be reported in a methods part of a paper?
@statsguidetree
@statsguidetree Жыл бұрын
Deciding whether to report the Omega or Omega Hierarchical should depend on your instrument. Is the underlying factor structure a bifactor model, then you could report Omega Hierarchical to get an overall estimate of reliability despite the multi factor structure. If not, you can report Omega total. I linked to another source: journals.sagepub.com/doi/full/10.1177/2515245920951747
@priyankaroy7243
@priyankaroy7243 Жыл бұрын
while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?
@statsguidetree
@statsguidetree Жыл бұрын
Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.
@priyankaroy3686
@priyankaroy3686 Жыл бұрын
@@statsguidetree Yes that's solved. Thanks!
@maciejbienkowski7488
@maciejbienkowski7488 Жыл бұрын
Awesome job! Thank you for doing it, it's very helpful.
@christianodinga2289
@christianodinga2289 Жыл бұрын
Thank you so much for this video! I hope I can get your consultation as I work on my analysis
@THEanemos23
@THEanemos23 Жыл бұрын
Many thanks for the very instructive video. I am following your lectures. At the end the plot(genDIF, labels, etc etc) seems to automatically plot in a new device, opens new window, however only the 'last' plot is available, as it seems to overwrite all previous ones. I am using Windows 10 and R Studio. Having had a look at stackoverflow I don't seem to be able to find the answer. It is suggested that the plot function is intended to plot automatically in a new device, but this is not clear. Unfortunately, unless fixed this renders the whole plotting useless. Any ideas/suggestions?
@samanthaspoor2011
@samanthaspoor2011 Жыл бұрын
I just ran into the same issue and haven't found a fix. Have you had any luck with this?
@user-fm6ih6sb6u
@user-fm6ih6sb6u Жыл бұрын
Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?
@statsguidetree
@statsguidetree Жыл бұрын
That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.
@HarmonicaTool
@HarmonicaTool Жыл бұрын
Thank you for the great video.
@MHRAJAI
@MHRAJAI 2 жыл бұрын
for how many features logistic regression works well? I have over 300 features, deos logistic regression work or other model is suggested? thank you
@statsguidetree
@statsguidetree 11 ай бұрын
I do not see much of a limit it is just your run time will be longer the larger the number of features you have. You may want to consider reviewing your data for like features, i.e., are there a cluster of features in your dataset that all provide the same information?
@vikasmishra4485
@vikasmishra4485 2 жыл бұрын
This video is pretty informative. I have one question. In cov balancing plot using cobalt, we need to match both mean and variance stats? In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?
@statsguidetree
@statsguidetree 2 жыл бұрын
It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.
@maddybond007
@maddybond007 2 жыл бұрын
Please validate if this link has same data, which you have posted initially, since your link is no more accessible: LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip
@statsguidetree
@statsguidetree 2 жыл бұрын
I will try to find a way to load the dataset on my GitHub. But, until then, I can email it you. Just send me an email at [email protected]
@detful83
@detful83 2 жыл бұрын
Amazing and useful video, thank you! One question: why fstruct3 only included items from G and F1 but not F2, when the previously tested bifactor model included items from both F1 and F2?
@statsguidetree
@statsguidetree 2 жыл бұрын
It was just a decision to use items from the factor with the higher value loadings in fstruct2 for a new example in fstruct3. It was just for an example and necessarily a method you should follow when exploring data.
@sharmilibalarajah1940
@sharmilibalarajah1940 2 жыл бұрын
Thank you, this was really helpful! Do you have any ideas about how I can approach this if I want to match three groups i.e. non-binary??
@statsguidetree
@statsguidetree 2 жыл бұрын
I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf
@Mustafa_Yousif_2016
@Mustafa_Yousif_2016 2 жыл бұрын
Thank you for this great video. Could you please share the code and data?
@padynz9869
@padynz9869 2 жыл бұрын
Very logical and lucid explanation. Thank you very much.
@statsguidetree
@statsguidetree 2 жыл бұрын
Thanks so much, I am glad you liked it.
@almalen2784
@almalen2784 2 жыл бұрын
do you have any tutorial on how to test irt assumptions using R?
@statsguidetree
@statsguidetree 2 жыл бұрын
Currently, no. But, I can do something in the very near future. As for now, there is the unidimTest() function in the ltm package that you can use to check the unidimensionality assumption of an IRT model you generated e.g., unidimTest(mod1). I will be sure to notify you if I do post another video in the near future going over assumptions (e.g., Unidimensionality, local independence, monotonicity, and item variance, etc.)