StatQuest: Hierarchical Clustering

  Рет қаралды 426,596

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Hierarchical clustering is often used with heatmaps and with machine learning type stuff. It's no big deal, though, and based on just a few simple concepts. If you want to draw a heatmap using R, I've put some sample code on my webiste: statquest.org/statquest-hiera...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZfaq Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #ML #clustering

Пікірлер: 363
@statquest
@statquest 2 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@anamulmbdu
@anamulmbdu 6 жыл бұрын
The intro song removed my fear of clustering. Thanks for the awesome video.
@nemothekitten3994
@nemothekitten3994 Жыл бұрын
going on a statequest😌
@Aemilindore
@Aemilindore 3 жыл бұрын
You're a person who saved me lots of time and pain. Thank you. I wish you the best
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@kristinomalley4519
@kristinomalley4519 Жыл бұрын
You are, and I cannot stress this enough, a national treasure!! The ease in how you explain things that have eluded me for over a decade and make it click is truly a gift. Thank you so freaking much!!!
@statquest
@statquest Жыл бұрын
Wow, thank you!
@julieboissiere4553
@julieboissiere4553 2 жыл бұрын
I used to watch your videos while I was a student. It’s been 3 years since my graduation and I’m still here (I’m changing jobs and need to review some stuff). Thank you a lot for your incredible work
@statquest
@statquest 2 жыл бұрын
Congratulations on the new job! BAM! :)
@davidescobar4449
@davidescobar4449 5 жыл бұрын
I have to congratulate you for this video, it gives the basic notions of the hierarchical cluster easy and fast. Bravo!
@rajshrestha9484
@rajshrestha9484 5 жыл бұрын
I can't thank you enough. Such clear and helpful explanations. Great.
@statquest
@statquest 5 жыл бұрын
Thanks! :)
@EthosDemerzel
@EthosDemerzel 3 жыл бұрын
you can, with patreon
@brunomartel4639
@brunomartel4639 4 жыл бұрын
this video proved that "hard" stuff =badly explained stuff
@sindhujas7807
@sindhujas7807 3 жыл бұрын
so fuckin true. Not sorry for swearing. Happy learning guys
@gummybear8883
@gummybear8883 3 жыл бұрын
if you can't explain something in simple terms, then you don't understand it that well.
@julius4858
@julius4858 3 жыл бұрын
@@gummybear8883 or you've been a professor for 20 years and are so deep into a topic that you completely forgot how people approach new problems. Your sentence really only applies to novices trying to be teachers.
@Joreselin
@Joreselin 3 жыл бұрын
@@julius4858 We could just change it to: if you can't explain something in simple terms, then you can't teach it that well.
@julius4858
@julius4858 3 жыл бұрын
@@Joreselin Yeah, that is absolutely true. Many of my professors for theoretical computer science are experts on various fields but man do their explanations suck. That's why I have to watch youtube videos for stuff like this.
@chikken007
@chikken007 4 жыл бұрын
I already watched some of your videos. This one I watched because I want to apply hierarchical clustering in my thesis. It is about time I buy one of your sweaters. I hope this supports you. Thanks for all the truly great explanations.THANK YOU!
@statquest
@statquest 4 жыл бұрын
Thank you very much!!! :)
@yamikag8363
@yamikag8363 2 жыл бұрын
your videos help me see the "big picture" of concepts. after your videos, I can actually understand what is going on and why we are doing something. Thank you!
@statquest
@statquest 2 жыл бұрын
Happy to help!
@fadikhattar290
@fadikhattar290 Жыл бұрын
I still don't believe how this content is free. Thank you sir!
@statquest
@statquest Жыл бұрын
Thanks!
@stephenwood9252
@stephenwood9252 Жыл бұрын
Love your videos. The fact that you make it so simple shows the depth of your understanding.
@statquest
@statquest Жыл бұрын
Thank you!
@scraps7624
@scraps7624 2 жыл бұрын
This channel is a treasure! Absolutely incredible job my man
@statquest
@statquest 2 жыл бұрын
Thank you so much 😀!
@pragyamishra9083
@pragyamishra9083 2 жыл бұрын
The visualizations and simplicity of explanations as well as great examples motivate me to keep learning. Thank you so much for making it so interesting. I'll try to do my bit by buying a t-shirt. 😊
@statquest
@statquest 2 жыл бұрын
Wow! Thank you very much! :)
@The_TusharMishra
@The_TusharMishra 6 ай бұрын
hi pragya
@jingsilu5568
@jingsilu5568 2 жыл бұрын
Thank you for clearly explaining the details at a moderate speed! You save me lots of time!
@statquest
@statquest 2 жыл бұрын
Thank you!
@liranzaidman1610
@liranzaidman1610 4 жыл бұрын
Very nice. I use this in Python and it's a really good way to cluster. Another thing - from coding aspect, it's only 1 line of code in Seaborn, very easy.
@statquest
@statquest 4 жыл бұрын
Thanks for sharing!
@davidcartwright337
@davidcartwright337 5 жыл бұрын
great videos, I like the way you explain these topics
@abhayjoshi2121
@abhayjoshi2121 2 жыл бұрын
You are simply amazing !! I love your style and simplicity and the word is BAM! .. your videos are very informative and worth going through... thanks for all your hard work in simplifying the complex topics
@statquest
@statquest 2 жыл бұрын
Thank you so much!!
@congchen170
@congchen170 7 жыл бұрын
Joshua's video is always helpful. Next time, probably k-means clustering.
@99harshini
@99harshini 4 жыл бұрын
Absolutely brilliant..Thank you sooo much for your time and effort!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@calebsawe8307
@calebsawe8307 2 жыл бұрын
I am super grateful for this video. You are such an excellent teacher! Thank you for being such a "you"
@statquest
@statquest 2 жыл бұрын
Wow, thank you!
@user-vg8dp5tb9w
@user-vg8dp5tb9w Жыл бұрын
This channels is truly a treasure trove! I was wondering if you could do a video on consensus clustering? I.e. how to evaluate clustering across multiple models and parameters. You are awesome!
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@anastasiyakuznetsova8797
@anastasiyakuznetsova8797 2 жыл бұрын
The best as always! Love this channel! It's super easy to understand
@statquest
@statquest 2 жыл бұрын
Thanks!
@websciencenl7994
@websciencenl7994 Жыл бұрын
StatQuest is the Best! Teaching is an art...and these are master pieces.
@statquest
@statquest Жыл бұрын
WOW! Thank you very much! :)
@gurkanyesilyurt4461
@gurkanyesilyurt4461 3 жыл бұрын
you saved yet another day Josh. Thank you
@statquest
@statquest 3 жыл бұрын
Bam! :)
@LBsCuriosity
@LBsCuriosity 5 жыл бұрын
really awesome video! This will help me with my test. Thank you!
@farzanaferdousi9885
@farzanaferdousi9885 3 жыл бұрын
Your explanation is very clear to me and i see all your video, you are very friendly to me. I like you very much.
@statquest
@statquest 3 жыл бұрын
Thank you! 😃
@urjaswitayadav3188
@urjaswitayadav3188 7 жыл бұрын
Great explanation. Thanks StatQuest!
@loftyTHEOWNER
@loftyTHEOWNER 2 жыл бұрын
I would like to add that: - single-linkage (comparing the closest points of 2 clusters) tends to form more elliptic clusters; - complete-linkage tends to form more globular clusters. So, that means that not scaling your data, scaling with a StandardScaler, or with a MinMaxScaler will affect your clustering.
@statquest
@statquest 2 жыл бұрын
Noted!
@fabiomaia3433
@fabiomaia3433 4 жыл бұрын
Hey Josh! Your videos are great! Thank you for the effort you've put on it! If you allow me... have you considered making videos explaining DBSCAN and HDBSCAN?
@statquest
@statquest 4 жыл бұрын
Yes, I've thought about those topics and may make a video about them.
@yyma8037
@yyma8037 4 жыл бұрын
Great video! Do you have any plans to talk about co-clustering, look forward to it.
@2327853
@2327853 4 жыл бұрын
@StatQuest please explain probability and Naive Bayes. Thanks in advance! I am a huge fan of your way of teaching and your small songs creations. Keep up the good work!
@statquest
@statquest 4 жыл бұрын
Thanks! Naive Bayes is on the to-do list.
@createyouridea8602
@createyouridea8602 4 жыл бұрын
@@statquest waiting. plz .
@Paulamiz
@Paulamiz 3 жыл бұрын
Watching this after watching your more recent videos. Missed your 'BAM's a lot!!! You should remake these old videos again! Thanks :)
@statquest
@statquest 3 жыл бұрын
bam! :)
@Paulamiz
@Paulamiz 3 жыл бұрын
@@statquest 😍
@vakarthi4
@vakarthi4 2 жыл бұрын
Found this gem of a channel today. Agreed on the fun rhymes and puns.
@robertogff
@robertogff 3 жыл бұрын
Congratulations! your video is so great! you explain is a very clear and simple way.
@statquest
@statquest 3 жыл бұрын
Thank you! 😃
@sonakshigarg4273
@sonakshigarg4273 4 жыл бұрын
You can explain the same concept with may be some other datasets and better visualisation other than heatmap
@fellsantfernandoargentin2072
@fellsantfernandoargentin2072 6 жыл бұрын
Congratulations from Brazil!
@preranadas4037
@preranadas4037 4 жыл бұрын
Hello Josh! The videos are soooooooo goooood! These are BAMMMMM Good!! 1 request - Could you please create a video on LCA - Latent Class Analysis? Maybe by comparing it to k-means clustering? I cannot be more thankful!
@LoriSchomp
@LoriSchomp 3 жыл бұрын
would like this too
@CapoeiraPiper
@CapoeiraPiper 3 жыл бұрын
Man your videos are soo super helpful! THANK YOU (ps consider the color library viridis to make it easier for the colorblind)
@statquest
@statquest 3 жыл бұрын
Thanks!
@isha996
@isha996 6 жыл бұрын
Please add a video on Latin Square design, Joshua! I am going to pass my stats final tomorrow, only because of your videos :D your students are lucky.
@isha996
@isha996 6 жыл бұрын
The CPA and clustering question was worth 30% of total marks on my exam today, and I managed to write them so well only because of your videos. you're a savior. Thank you!!
@eamiller12
@eamiller12 2 жыл бұрын
THANK YOU! This is has been SO HELPFUL!
@statquest
@statquest 2 жыл бұрын
bam!
@HiasHiasHias
@HiasHiasHias 27 күн бұрын
StatQuest never disappoints
@statquest
@statquest 27 күн бұрын
BAM! :)
@vishk123
@vishk123 7 ай бұрын
Thank you for allowing me to ascend the stats hierarchy!
@statquest
@statquest 7 ай бұрын
bam! :)
@saikiranjajula2033
@saikiranjajula2033 4 жыл бұрын
Thank You Sir, It was awesome to learn from you.
@statquest
@statquest 4 жыл бұрын
BAM! :)
@cfonsecaparis812
@cfonsecaparis812 2 жыл бұрын
Hi Josh, I am really enjoying your videos specially the wha whas and bam !! , you make stats sound easy but also fun! Thank you! I wonder if you could please do a video to explain the different uses of PCA and HCA, when do you use one or the other? In the mean time I will watch your videos on PCA and HCA :) hooray!
@statquest
@statquest 2 жыл бұрын
BAM! Thank you very much! I'll keep that topic in mind.
@LetWorkTogether
@LetWorkTogether 4 жыл бұрын
I love this. Your video is wonderful!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@alyssawang144
@alyssawang144 3 жыл бұрын
fantastic explanation, thank you so much for this video.
@statquest
@statquest 3 жыл бұрын
Thanks!
@veloisamascarenhas7531
@veloisamascarenhas7531 6 жыл бұрын
how can clustering be applied on spectral data?
@saipanchajanya5980
@saipanchajanya5980 4 жыл бұрын
This is Awesome...... Please Make a session on K Modes, KNN and K Prototypes
@statquest
@statquest 4 жыл бұрын
Here's a complete list of my videos so far: statquest.org/video-index/
@jovanmampusti4025
@jovanmampusti4025 2 жыл бұрын
Thank you so much sir! This is very helpful and very informative.
@statquest
@statquest 2 жыл бұрын
Glad it was helpful!
@muhammadiqbalmarzuki
@muhammadiqbalmarzuki 4 жыл бұрын
This video is super duper bam bam double double bam! Will you cover more advanced clustering techniques such as model-based clustering (MCLUST) and weighted gene co-expression network analysis (WGCNA)? I'm learning about these things now for my research, and will be very grateful if you can cover these topics for me. Thanks! :)
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@rodrigohaasbueno8290
@rodrigohaasbueno8290 5 жыл бұрын
I love this channel so much
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@nnnyin6967
@nnnyin6967 Жыл бұрын
I am preparing my actuarial exam and you saved me a lot❤
@statquest
@statquest Жыл бұрын
Good luck! :)
@balajicanchi5538
@balajicanchi5538 6 жыл бұрын
Explained in a simple manner.
@12bjab
@12bjab 5 жыл бұрын
just beautiful!
@setareht7546
@setareht7546 2 жыл бұрын
Thank you for all your videos clearly explaining complex concepts. Can you also make video(s) on different bi-clustering methods?
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind.
@jonathanlam7204
@jonathanlam7204 7 ай бұрын
Thank you. Better than university teaching
@statquest
@statquest 7 ай бұрын
Thanks!
@tymothylim6550
@tymothylim6550 3 жыл бұрын
Thank you very much for this video! It was really well done :)
@statquest
@statquest 3 жыл бұрын
Glad you liked it!
@user-ib9lp8zx6x
@user-ib9lp8zx6x 6 жыл бұрын
Hi, Joshua. Do you know the basics of pseudotime analysis in single-cell RNA-seq. Can you make a short video talking about the basics? Thanks!
@statquest
@statquest 6 жыл бұрын
I'll put that on the to-do list!
@proggenius2024
@proggenius2024 3 ай бұрын
awesome content and delivery
@statquest
@statquest 3 ай бұрын
Glad you think so!
@zzzluke8906
@zzzluke8906 9 ай бұрын
Hi Josh, amazing video as always. Think you can come up with video on how to determine the best number of clusters to have? I get the Elbow method, but I really struggle with the inconsistent method. I was looking at the inconsistency coefficients, and I am confused to do they include singleton clusters, or are singleton clusters excluded. I am also confused about what exactly is the "jump" in the inconsistent coefficient that we are supposed to look out for.
@statquest
@statquest 9 ай бұрын
I'll keep that topic in mind.
@MihirSriramVadali
@MihirSriramVadali 21 күн бұрын
Great channel. Clearly explained all most all the topics i watched on ML. Here one question what does gene stands for is it features of the data ?
@statquest
@statquest 21 күн бұрын
Yes, it's a feature.
@shamanthrajreddy1230
@shamanthrajreddy1230 2 жыл бұрын
Excellent explanation!
@statquest
@statquest 2 жыл бұрын
Thanks!
@yvonnemadegwa967
@yvonnemadegwa967 5 жыл бұрын
Thank you very much! Can you teach software's? Like R-basic introduction, basics of how to arrange date with various commands?
@statquest
@statquest 5 жыл бұрын
I have a handful of videos that teach you how to do certain things in R. They don't start at the very beginning, but I still go one step at a time. You can find these videos on the index page: statquest.org/video-index/
@yvonnemadegwa967
@yvonnemadegwa967 5 жыл бұрын
@@statquest Thank you very much.
@daminithandele7237
@daminithandele7237 4 жыл бұрын
Hi Josh! Can you please make a video on DBSCAN, if possible? Especially the parameter tuning part of it, I'm sure that would be of great help to lots of people.
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@the_data_panda
@the_data_panda 5 жыл бұрын
@StatQuest with Josh Starmer, in this video you are clustering and combining genes (the attributes of data), aren't you supposed to cluster and combine the samples? that's the inverse of the approach shown
@statquest
@statquest 5 жыл бұрын
You can cluster the samples or the genes, or both! It all depends on the question you are asking. For example, if I have some healthy people and some sick people, I might be interested in clustering the people (to see if healthy people form one cluster and unhealthy people form another) or I might be interested in clustering the genes. In this case I would find out which genes are correlated and up-regulated in healthy people compared to unhealthy people. Or I could do both. Does that make sense?
@maikfranke2303
@maikfranke2303 Жыл бұрын
Amazing! Your Videos are so much comrehensible. I really enjoy watching!!!*_*
@statquest
@statquest Жыл бұрын
Thank you!
@solibozorgmehr6524
@solibozorgmehr6524 3 жыл бұрын
Thanks for the explanation. Can you please make a video about consensus NMF clustering?
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@naturelove9396
@naturelove9396 3 жыл бұрын
Hey you explain this very well and in very simple form thanks for this, I request you could you please make one video on DEGseq2, means finding DEG gene between the time points and then drawing the heatmap, volcano plot and cluster lines. Thanks
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind. I already have a few videos on DESeq2 here: statquest.org/video-index/
@user-gd2zf9ym4h
@user-gd2zf9ym4h 6 ай бұрын
You saved my life😇 Thank you very much. And I think the link for the sample code in R isn't available right now...
@statquest
@statquest 6 ай бұрын
Yep, that's a really old link. Here's a new one: statquest.org/statquest-hierarchical-clustering/
@mojtabasardarmehni453
@mojtabasardarmehni453 3 жыл бұрын
Great as always! Thanks.
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@blockeduser4407
@blockeduser4407 4 жыл бұрын
Hey Josh, I want to explain things to some of my friends in the very same way you did. Is there any possibility for me to have the Presentation Slides you are using?
@217Legendary
@217Legendary 4 жыл бұрын
Easy!... send them the video and let him keep the credit...
@monishaathikesavanpremalat7587
@monishaathikesavanpremalat7587 4 жыл бұрын
How to validate these clustering techniques? I mean for a given dataset, let’s assume I have tried various hierarchical clustering techniques like single linkage, complete linkage, etc using various distance matrix for each method. How to pick the right one from all these different clusters which has been formed for that particular dataset
@statquest
@statquest 4 жыл бұрын
This is going to sound very disappointing, but since these methods are generally used to explore data and extract new insights from it, then you pick the method that gives you the most insight. So try them and see if one makes more sense than the others.
@lukehebert6207
@lukehebert6207 4 жыл бұрын
Very helpful, thank you!
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@emamulmursalin9181
@emamulmursalin9181 3 жыл бұрын
Great explanation Josh! Just one question, are we clustering samples(data points) or the Genes(features)? If we are clustering Genese, does not it mean that we are just clustering the correlated features?
@statquest
@statquest 3 жыл бұрын
In this video we are clustering the genes, and yes, the idea is that correlated features are brought together. We could even just calculate the correlation coefficient for each pair and cluster based on those values.
@emamulmursalin9181
@emamulmursalin9181 3 жыл бұрын
@@statquest Thanks for your reply. But I have seen some other blogs where authors are plotting 2D data points and using hierarchical clustering. So in real life we use hierarchical clustering for data clustering or feature clustering?
@statquest
@statquest 3 жыл бұрын
@@emamulmursalin9181 I'm not sure what you mean by "data" clustering, however, we can cluster the rows or the columns with similar ease. It doesn't matter if one is features and the other is samples.
@emamulmursalin9181
@emamulmursalin9181 3 жыл бұрын
@@statquest Sorry for using an unclear term. Actually I meant "samples" by using the term "data". So, can hierarchical clustering be used for "feature clustering" (for example, finding correlated features and remove the redundant features) and also as "sample clustering" (e.g. just like K means clustering ) ?
@statquest
@statquest 3 жыл бұрын
@@emamulmursalin9181 Yes. We can cluster the rows just as easily as we cluster the columns.
@MrKingoverall
@MrKingoverall 5 жыл бұрын
I LOVE YOU JOSH !
@statquest
@statquest 5 жыл бұрын
:)
@raghavmoar3211
@raghavmoar3211 5 жыл бұрын
Thanks for the video
@kakusniper
@kakusniper 6 жыл бұрын
The heatmaps at the end, I have seen those a lot. Which package in R did you use ? or its the heatmap.2() with different colors ?
@marahakermi-nt7lc
@marahakermi-nt7lc 11 ай бұрын
ohh my god thanks josh u are so brilliant i think marvel should add another new superhero "josh starmer the life saver"
@statquest
@statquest 11 ай бұрын
:)
@surbhardwaj1721
@surbhardwaj1721 3 жыл бұрын
Amazing explanation. Please make a video on Cluster evaluation. :)
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@oliviagallupova9199
@oliviagallupova9199 4 жыл бұрын
You saved me a week
@statquest
@statquest 4 жыл бұрын
Awesome! :)
@mikecy5507
@mikecy5507 Жыл бұрын
Great channel! Clear explanations. In HCA, could you not follow up the clustering of rows (genes) by clustering the columns (samples)? Is this automatically done? Does not seem like the best heatmap would be produced if you just cluster/shuffle rows. Would have to cluster/shuffle columns, too, right? Also, must/should the data be standardized first?
@statquest
@statquest Жыл бұрын
You can cluster both columns and rows. And sometimes standardizing helps, sometimes it doesn't. It's worth trying both options.
@mikecy5507
@mikecy5507 Жыл бұрын
@@statquest Thanks!
@Sean-lz2dh
@Sean-lz2dh Жыл бұрын
great video. thank you very much
@statquest
@statquest Жыл бұрын
Thanks!
@italosayan4747
@italosayan4747 6 жыл бұрын
beautiful BRO!
@anthonychan4478
@anthonychan4478 5 жыл бұрын
Hi Joshua, can you do a video on Gaussian Mixture Models? Also, your videos are awesome! Keep it up.
@statquest
@statquest 5 жыл бұрын
The good news is that is already on the To-Do list. I'll bump it up a notch since you requested it as well.
@jordanmakesmaps
@jordanmakesmaps 5 жыл бұрын
@@statquest, make that two requests! Thanks!
@statquest
@statquest 5 жыл бұрын
@@jordanmakesmaps Cool! It's in the top 10 things for me to do, so hopefully I'll get to it soon.
@jacobmoore8734
@jacobmoore8734 5 жыл бұрын
@@statquestYes! Anytime people start talking about gaussian mixture models, EM, "sampling the posterior", and MCMCs - I get cold sweats.
@iranziemiler8135
@iranziemiler8135 4 жыл бұрын
Thank you
@ardaugurlu8673
@ardaugurlu8673 5 жыл бұрын
Good job mr josh.
@statquest
@statquest 5 жыл бұрын
Thank you!
@hamidkiangaikani
@hamidkiangaikani 2 жыл бұрын
4.4 K likes, zero dislikes! You're awesome. Thanks very much
@statquest
@statquest 2 жыл бұрын
bam!
@tudorpricop5434
@tudorpricop5434 10 ай бұрын
At 7:28, we calculated the number 3.2 being the difference between gene 1 and gene 2. But the whole purpose of calculating is to figure out which gene is the most similar with gene 1 (for example). Now my question: After we compute the values between [gene 1 and gene 2], [gene 1 and gene 3] and [gene 1 and gene 4], we select the gene with the SMALLEST VALUE as the most similar gene to gene 1 ? Or the BIGGEST VALUE ? I think the smallest, but just to be sure..
@statquest
@statquest 10 ай бұрын
In this case we want the smallest distance, which means the most similar.
@lucha6262
@lucha6262 4 жыл бұрын
Could you show the maths/expression for when you're calculating the Euclidean distance for more than 2 genes?
@lucha6262
@lucha6262 4 жыл бұрын
I'm doing the maths and I think I've answered my own question, you would never really calculate the distance between more than 2 things, let that either be two genes, two cluster or a cluster and gene, right? And then for more than 2 samples you would do D = sqrt(d1^2+d2^2+d3^2), correct?
@statquest
@statquest 4 жыл бұрын
You are correct!
@manuelsokolov
@manuelsokolov Жыл бұрын
Dear StatQuest! Thank you for the explanation. 1. What is the best would you would evaluate the algorithm (silluete score,...) to decide which clustering method and distance to use ( i undestand that silluete score is good to choose the number of k but not to decide between algorithms)? To decide the best algorithm i have been ploting PCA and color label by clusters created this way understanding if the clusters make sense or not? (however it is known by literature that PCA does not work well to evaluate binary data) 2. In the case that the data is binary, (e.g instead of expression data, genomic alteration data) what kind of distance would you use? Best Regards, Manuel
@statquest
@statquest Жыл бұрын
1) I guess it depends. If I had "training" data, with known categories, I would compare how many times the data were correctly and incorrectly grouped. Otherwise, it really just boils down to subjective preference. 2) If you measure a lot of things, the euclidian distance will still work in this situation.
@subhabrataghosh9831
@subhabrataghosh9831 3 жыл бұрын
Excellent Sir
@statquest
@statquest 3 жыл бұрын
Thanks!
@python_information601
@python_information601 2 жыл бұрын
Nice explanation 👍👍
@statquest
@statquest 2 жыл бұрын
Thanks!
@ruddhidavidwans292
@ruddhidavidwans292 3 жыл бұрын
Can you please explain how to perform this analysis in R studio and qiime 2?
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@Argho555
@Argho555 3 ай бұрын
Thank You
@statquest
@statquest 3 ай бұрын
:)
@taleco21
@taleco21 2 жыл бұрын
Hey, Josh, is there any video in which you address unsupervised and supervised hierarchical clustering of gene and lincRNA expressions? If not, could you do a video about that or provide me with some links to read about? I can't find any. Thanks.
@statquest
@statquest 2 жыл бұрын
This video is unsupervised hierarchical clustering.
@taleco21
@taleco21 2 жыл бұрын
@@statquest oh, yeah, thanks. I just did some readings about unsupervised and got more info. I’ll keep searching for supervised clustering. Thanks a lot! Great video.
@ramsha8540
@ramsha8540 3 ай бұрын
10:08 do you have any videos that talk about clustering in R? Thankyou for all your explanations btw!!
@statquest
@statquest 3 ай бұрын
Unfortunately, no. :(
@AdnanGora
@AdnanGora 5 жыл бұрын
Awesome video
@statquest
@statquest 5 жыл бұрын
Thank you! :)
@patriciacontreras8435
@patriciacontreras8435 3 ай бұрын
Thank you very much!🥰 You saved my life 🥲 I have a question, if my dataset has continuous variables (ex. income) and a discrete variable (ex. number of children in the household). How can I measure the distance between them? Thank you!!!
@statquest
@statquest 3 ай бұрын
You can use one-hot-encoding kzfaq.info/get/bejne/a55poaZ4yr2rYas.html or you can use a random forest to do the clustering kzfaq.info/get/bejne/qbdoapOSubHVmYE.html
@patriciacontreras8435
@patriciacontreras8435 3 ай бұрын
@@statquest Thanks again! I think I will learn a lot if I subscribe to this channel 🥰🥰
@kanacaredes
@kanacaredes 3 жыл бұрын
Hi Josh!! We need a DBSCAN tutorial please!!!!
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind! :)
@khawlaou5385
@khawlaou5385 Жыл бұрын
You're THE BEST
@statquest
@statquest Жыл бұрын
Thanks!
StatQuest: K-means clustering
8:31
StatQuest with Josh Starmer
Рет қаралды 1,6 МЛН
StatQuest: Principal Component Analysis (PCA), Step-by-Step
21:58
StatQuest with Josh Starmer
Рет қаралды 2,8 МЛН
Cool Items! New Gadgets, Smart Appliances 🌟 By 123 GO! House
00:18
123 GO! HOUSE
Рет қаралды 10 МЛН
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 103 МЛН
That's how money comes into our family
00:14
Mamasoboliha
Рет қаралды 12 МЛН
Hierarchical Cluster Analysis [Simply explained]
8:22
DATAtab
Рет қаралды 62 М.
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Hierarchical Agglomerative Clustering [HAC - Single Link]
14:35
Anuradha Bhatia
Рет қаралды 487 М.
StatQuest: t-SNE, Clearly Explained
11:48
StatQuest with Josh Starmer
Рет қаралды 456 М.
Clustering: K-means and Hierarchical
17:23
Serrano.Academy
Рет қаралды 196 М.
Support Vector Machines Part 1 (of 3): Main Ideas!!!
20:32
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
Google Data Center 360° Tour
8:29
Google Cloud Tech
Рет қаралды 5 МЛН
Clustering with DBSCAN, Clearly Explained!!!
9:30
StatQuest with Josh Starmer
Рет қаралды 288 М.
Flat and Hierarchical Clustering | The Dendrogram Explained
8:27
365 Data Science
Рет қаралды 131 М.
Cool Items! New Gadgets, Smart Appliances 🌟 By 123 GO! House
00:18
123 GO! HOUSE
Рет қаралды 10 МЛН