How to manipulate gene expression data from NCBI GEO in R using dplyr | Bioinformatics for beginners

  Рет қаралды 49,539

Bioinformagician

Bioinformagician

Күн бұрын

This is a basic hands-on tutorial to manipulate gene expression (RNA-Seq) data from NCBI GEO in #R using the dplyr package.
In this video, I have demonstrated how to read gene expression data in R, retrieve metadata using the GEOquery R package, and perform data manipulation using basic dplyr functions.
Link to code: github.com/kpatel427/KZfaqT...
Link to GEOquery vignette:
www.bioconductor.org/packages...
Understanding joins in dplyr: statisticsglobe.com/r-dplyr-j...
Show your support and encouragement by buying me a coffee:
www.buymeacoffee.com/bioinfor...
Chapters:
0:00 Intro
0:48 Requirements
0:55 Set up RStudio for Analysis
5:50 Read data in R
8:31 Get metadata using GEOquery package
13:16 dplyr: select(), rename(), mutate()
20:57 Reshape data: gather()
25:42 Join data: left_join()
28:55 dplyr: filter(), group_by(), summarize(), arrange()
To get in touch:
Website: bioinformagician.org/
Github: github.com/kpatel427
Email: khushbu_p@hotmail.com
#bioinformagician #bioinformatics #genomics #beginners #tutorial #howto #omics #research #biology #ncbi #GEO #rnaseq #ngs
#R #dplyr #tidyverse #GEOquery #data #wrangling #genomics

Пікірлер: 101
@mahshidpooladvand8502
@mahshidpooladvand8502 6 күн бұрын
This was the best tutorial I could possibly find online!!! You are incredibly smart! Thanks!
@danielajbq
@danielajbq 2 жыл бұрын
youre an ANGEL for making these. I am doing my MS in bioinformatics right now and this is genuinely better than some of my courses. Thank you!!
@MichealIdedia
@MichealIdedia Ай бұрын
Hello, are you done with your Msc now?
@eylulozerbil8548
@eylulozerbil8548 Жыл бұрын
This tutorial encouraged me to continue my R learning process by showing me how I can manipulate these kind of datas in the simplest way! thank you bioinformagician :)
@Radslom
@Radslom Жыл бұрын
This video was extremely helpful for me. I am currently learning how to use R and GEO2, and this video helped to clarify it. Thank you and keep up the great work!
@mayank9986
@mayank9986 Жыл бұрын
I am new to programming. I was looking for help to analyse RNAseq data and your video just came as a blessing. Thank you a ton.
@sanjaisrao484
@sanjaisrao484 2 жыл бұрын
Excellent explanation, Thanks for teaching the basics of R, It was extremely helpful, please continue to make more videos
@bioseqbytes
@bioseqbytes Жыл бұрын
Very well made video and your understanding of the subject is tremendous!
@zlj8435
@zlj8435 2 жыл бұрын
Thank you for this wonderful course! I am a year 1 PhD student and it really helps me a lot!
@seungwonkim8359
@seungwonkim8359 Жыл бұрын
Really helpful! Thank you very much. I hope you continue these marvelous work for long, since I am working on bulk/single cell RNA seq these days.
@BISMILLAH7334
@BISMILLAH7334 2 жыл бұрын
Excellent ! Thank you for the tutorial . Looking forward to many more such useful tutorials
@amitrupani9898
@amitrupani9898 2 жыл бұрын
Thank you for this very helpful video! I have recently moved from a clinical genetics laboratory to a research laboratory where pipelines are written in R and they extensively leverage the capabilities of dplyr library. So, I needed a tutorial to help me understand its basic functioning. This helped. Keep up the good work you are doing through this channel. Cheers!!
@Bioinformagician
@Bioinformagician 2 жыл бұрын
I am really glad this helped you get a basic understanding of dplyr package. Thank you for your kind words, encourages me to do more of this! ☺️
@mikewafula9470
@mikewafula9470 Жыл бұрын
Thanks so much for this great video. You have made it easy for me to explore gene data analysis with R. Keep sharing such content. Cheers!!
@syedmansoorjan2671
@syedmansoorjan2671 2 жыл бұрын
Amazing, don't have words to say for you.. try to share more... I just found this very helpful...
@setarehsohail5422
@setarehsohail5422 2 жыл бұрын
Amazing!! You are a professional teacher!! Thanks!
@user-uq3qh2cy9v
@user-uq3qh2cy9v Жыл бұрын
Very helpful and you are very patient. It seems that you know exactly what my questions are.
@mocabeentrill
@mocabeentrill Жыл бұрын
Thank you. You're really good at what you do. I did tis in base R and oh my word, it looks grotesque!
@claudiocesarmontenegrojuni5141
@claudiocesarmontenegrojuni5141 10 ай бұрын
You're amazing teacher! Thank you so much for this outstanding content.
@Saed7630
@Saed7630 Жыл бұрын
Clean, clear and informative!
@mirazulkifli9165
@mirazulkifli9165 8 ай бұрын
Thank you so much for making content like this. It's extremely helpful for beginners like me trying to analyze gene expression data on Rstudio.
@jammerkd
@jammerkd 2 жыл бұрын
Excellent videos and you are a fantastic teacher
@cerenuzun5989
@cerenuzun5989 2 жыл бұрын
It was very helpful and it would be great if you continue these tutorials. Thank you so much!!
@Bioinformagician
@Bioinformagician 2 жыл бұрын
I am glad you find my videos helpful! :)
@karthibiotech426
@karthibiotech426 2 жыл бұрын
Wow.. its very helpful I am just practicing with another dataset..with your same protocol... Thanks a lot...
@aishaa812
@aishaa812 Ай бұрын
Thank you. Its extremely helpful for me since I am a beginner in R studio and I am trying to apply data analysis in R studio.
@hemanthchenga5671
@hemanthchenga5671 Жыл бұрын
Thanks for explaining the code in detail and please make more videos
@o1kun
@o1kun Жыл бұрын
Your video really helped me!! Really appreciate it😊
@ayobamiogunsola6139
@ayobamiogunsola6139 11 ай бұрын
Thank you for making this video. It has been helpful.
@user-vk8bd1re8c
@user-vk8bd1re8c 2 ай бұрын
Thank You my new teacher I work actually about that biogenetics in IT and C++ this video helps me very much ❤️🙏👌
@rajanirao6011
@rajanirao6011 2 жыл бұрын
These videos are so good!!! Good practise to learn R. Thank you!
@Bioinformagician
@Bioinformagician 2 жыл бұрын
I am glad you found this helpful! :)
@xelaldaero9339
@xelaldaero9339 Жыл бұрын
Thank you! Your videos are very useful!
@gustavoantoniobrugesmorale1881
@gustavoantoniobrugesmorale1881 Жыл бұрын
You are excellent. Thank you!!!
@sayeman9577
@sayeman9577 8 ай бұрын
Thanks! Very helpful
@alaminafendy6071
@alaminafendy6071 7 ай бұрын
Thank you so much. Nicely explain..
@tushardhyani3931
@tushardhyani3931 2 жыл бұрын
Thank you for this video !!
@1980yadalam
@1980yadalam 2 жыл бұрын
very good video, thanks.
@MohammadNasirAbdullah
@MohammadNasirAbdullah 5 ай бұрын
Thank you so much, it really helps me 😊😊😊😊😊😊😊😊
@IslamSafwat--
@IslamSafwat-- 2 ай бұрын
GREAT! many thanks::)
@user-gf2zm4gg1g
@user-gf2zm4gg1g Жыл бұрын
Thank you for the great tutorial! Just to let you know, I had to download these packages first to perform your script. install.packages("dplyr") install.packages("tidyverse") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GEOquery")
@mohammeddabbour2254
@mohammeddabbour2254 Жыл бұрын
Wonderful explanation. Thank you so much for making this tutorial. Just a sidenote: when both dplyr and plyr (from tidyverse) packages are loaded and you want to use a certain function, it is better to specify the package the function is available in when calling the function (such as: dplyr::rename()). Otherwise, R may mistakenly think you are trying to use the function in the plyr package and return an error. Happy coding!
@Bioinformagician
@Bioinformagician Жыл бұрын
Correct, thanks for pointing it out. Have taken care of that in the videos following this video :)
@coolpad1572
@coolpad1572 2 жыл бұрын
dplyr is part of tidyverse. If you load tidyverse, you don't need to load dplyr separate unless there is a function clash. You can combine two mutate statements into one , just the way you did with summarize function. Please also use Cmd+Shift+M (mac) for %>% instead of typing it multiple times.
@Bioinformagician
@Bioinformagician 2 жыл бұрын
The keyboard shortcut is a lifesaver. Thanks for letting me know :)
@mirazulkifli9165
@mirazulkifli9165 8 ай бұрын
Thanks for the shortcut!
@hasnainjadoon5580
@hasnainjadoon5580 Ай бұрын
Plz how can I install library (GEOquery) in my rstudio they told me there is no package like that plz help
@ankushdehlia3739
@ankushdehlia3739 Ай бұрын
@@hasnainjadoon5580 Hi, try using if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager') BiocManager::install("GEOquery")
@sanjaisrao484
@sanjaisrao484 2 жыл бұрын
Thanks
@kajalpanchal8239
@kajalpanchal8239 Жыл бұрын
thankya Khushbu!
@moulytasnuva1860
@moulytasnuva1860 Жыл бұрын
@Bioinformagician Is there any process to find the threshold value from FPKM to compare the early and late stages of cancer?
@juliangrandvallet5359
@juliangrandvallet5359 Жыл бұрын
Amazing!!!! now how can I plot a heatmap out of this data?
@lisahuang850
@lisahuang850 2 жыл бұрын
Really nice video! I was wondering if you could demonstrate how to convert the raw count to tpm or fpkm values in r as my GSE dataset provide raw count. Thanks!
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Thanks for the suggestion. Will plan a video covering this!
@mikewafula9470
@mikewafula9470 Жыл бұрын
Thanks again for the video. I have managed to download the gene expression data (GSE 216497). How do I get its corresponding metadata.
@jithus89
@jithus89 5 ай бұрын
> gse = GEOquery::getGEO(GEO = 'GSE183947', GSEMatrix = TRUE) Error in open.connection(x, "rb") : Problem with the SSL CA cert (path? access rights?) why this error?
@yahyayozbatiran
@yahyayozbatiran Жыл бұрын
Hello, how can i plot a specific gene expression in cancer subtypes from tcga, for example; I want to plot> MSH2 gene expressions in Colon Mucinous versus Colon Adenocarcinoma
@QAKS1264
@QAKS1264 2 жыл бұрын
@muneeramashkoor7919
@muneeramashkoor7919 2 жыл бұрын
Hello, your videos are very informative. I am trying to look at the gene expression of my gene of interest. The supplementary data in GEO is in the form of a .fpkm_tracking file. How can I go about solving/looking at the expression using these files? Thank you!
@Bioinformagician
@Bioinformagician Жыл бұрын
If there are no raw counts provided, you can create them yourself. You can fetch RNA-Seq reads associated with GEO dataset from SRA. Once you get the reads, you can align and quantify them to get counts.
@arcturusdig1673
@arcturusdig1673 11 ай бұрын
I can't understand most of the things you do. I need to go to other tutorial videos for understanding every single step. If you want your viewers to understand especially beginners, then please make your explanation more lucid and easy.
@harshjasani8637
@harshjasani8637 Жыл бұрын
Hello, Thank you for amazing video and tutorials. I could not load the GEOquery library, any ideas what could be the reason?
@Bioinformagician
@Bioinformagician Жыл бұрын
probably you need to install it first before loading?
@chinspostdoc
@chinspostdoc Жыл бұрын
HI have some questions. Please help to resolve the or to understand them. What if the GEO study only gives us a raw file containing either text files, or . CEL files. how to read the data from that. 2) suppose if a GEO study contain many samples of different tissues, then how to make 2 groups comprising on only those samples that a person is interested e.g. as i want to compare expression data from healthy and covid patients but GEO study contain some samples of ell lines treated with a certain chemical along with tissues of healthy and covid patients. Then how can i make two group with heathy and covid name and also includes samples into those groups accordingly. 3) If GEO raw file contain count.text files of each sample then how we can use them for differential expression analysis. Your kind reply would be much appreciated.
@aheedan9957
@aheedan9957 Жыл бұрын
Hi, nice one, but I did not understand the part of pData and phenodata function.
@sharadjaiswal1705
@sharadjaiswal1705 Жыл бұрын
Ma'am how to write R script. that are used in this video?
@aytacoksuzoglu2975
@aytacoksuzoglu2975 Жыл бұрын
why did we put -> .
@andyderek3021
@andyderek3021 Жыл бұрын
Thank you for this well explained video. Please, if i want to do survival analysis based on gene expression data with lets say GE183947, how can i get the clinical data information from GEO ?
@Bioinformagician
@Bioinformagician Жыл бұрын
If it is not provided with the metadata, you might have to reach out to the authors.
@SamipSapkota-zg8hy
@SamipSapkota-zg8hy 25 күн бұрын
the value of strain samples and cell.type becomes null
@irodasay3448
@irodasay3448 2 жыл бұрын
Thank you for the tutorial. I have a question about converting GSE to ExpressionSet. I used your vignette and tried to do the same for GSE181462. 1th I got GSE by : gse
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Try changing GSEMatrix = FALSE
@mohamedalfaki4268
@mohamedalfaki4268 2 жыл бұрын
Hi and thanks for this very nice tutorial, I have this error when I am trying to reshape the data Error in `stop_formula()`: ! Formula shorthand must be wrapped in `where()`. # Bad data %>% select(~gene) # Good data %>% select(where(~gene))
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Can you give me a little context of what you are trying to do? I am having a hard time recreating this error. Thanks!
@melinaguillon2449
@melinaguillon2449 28 күн бұрын
Hi! I can't install GEOquery, I get this error message: Warning in install.packages : package ‘GEOquery’ is not available for this version of R
@terryadams2652
@terryadams2652 Жыл бұрын
@Bioinformagician, I apologize for my question (please), but, as a Biologist, I am now learning Python. I really don't want to spend what little time I have learning another language (R). So, to get these results, is it possible to just use Python instead of R? Thank you very much, my dear.
@Bioinformagician
@Bioinformagician Жыл бұрын
You can perform R equivalent operations in python. I believe it is pandas package in python that will allow you to do all your data wrangling.
@vahidgorganli8895
@vahidgorganli8895 Жыл бұрын
🙂👍
@awa8061
@awa8061 2 жыл бұрын
can you suggest any python package for gene expression analysis?
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Unfortunately, I do not have any recommendations for python packages. I only use R for gene expression analysis.
@markrenton6981
@markrenton6981 8 ай бұрын
Can someone please explain what the two ".." are at the start of her file path when reading in the data file?
@Bioinformagician
@Bioinformagician 8 ай бұрын
The "../" is the Linux notation to move up a directory level in the file system hierarchy. For instance, if you're in the directory "/home/user/documents/" and you use "../", you'll move up to the "/home/user/" directory.
@faizu0076
@faizu0076 Жыл бұрын
I didnt founr getGEO protein query in this there is no any package support with this name solve rhe problem plz
@Ijazalijin
@Ijazalijin Жыл бұрын
how can is activate the GEOquery packge??
@Bioinformagician
@Bioinformagician Жыл бұрын
Run library(GEOquery) at the beginning of the script
@killa14108
@killa14108 2 жыл бұрын
Hi what happens when there are NAs in the gene expression data? The accession number is GSE70947 and it's a breast cancer data set with 296 total samples and 62976 features (genes). I followed what you did and queried the data directly using GEOquery from Bioconductor. I am just stuck now and figuring out how to deal with NAs and would appreciate your help. Thank you!
@coolpad1572
@coolpad1572 2 жыл бұрын
In general, you can filter the rows (genes) with NAs. But it can also happen that only few samples (for a gene) have NAs and you do not want to loose other samples. Then you can replace NAs with small value such as 1 or 2.
@Bioinformagician
@Bioinformagician 2 жыл бұрын
I would quantify the NAs for each gene across all samples and filter out genes that have NAs in more than half of the samples. I usually prefer to replace NAs with 0.
@killa14108
@killa14108 2 жыл бұрын
@@Bioinformagician Thank you very much! Do you also might have any recommended methods for feature (gene) selection for creating a classification model in predicting cancer/normal samples?
@imvasco
@imvasco 2 жыл бұрын
What about GEO data thats not CSV but TXT?
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Sometimes gene expression data is also available as a .txt file on GEO. You could read in .txt similar to how you read a .csv file in R. Please make sure .txt file contains gene expression data. Usually, the 'data processing' section for each sample should provide details on what does the txt file contains and how it is processed.
@zeynepdurkaya883
@zeynepdurkaya883 11 ай бұрын
ı cant command call the data the chapter 6.14 isnt clear enough
@bioseqbytes
@bioseqbytes Жыл бұрын
Hi, I tried installing GEOquery package and got error - package GEOquery is not available for this version of R, could you please help.
@naveenyethirajula1279
@naveenyethirajula1279 Жыл бұрын
Please tell me how to install it
@hiraalmas9042
@hiraalmas9042 Жыл бұрын
I am facing same issue
@sanjaisrao484
@sanjaisrao484 2 жыл бұрын
Mam some doesn't have sample names in Geoquery metadata please help, I am stuck here
@Bioinformagician
@Bioinformagician 2 жыл бұрын
Are you using the same dataset used in the video?
@gargiagravanshi355
@gargiagravanshi355 Ай бұрын
Hello ma’am ! I funckin need your help I’m stuck with a project and my mentor is very toxic please let me know how can I contact you.
@Bioinformagician
@Bioinformagician Ай бұрын
My contact details can be found in the video description :)
@muhammadrafiq7645
@muhammadrafiq7645 2 жыл бұрын
great vedio can you please share your email indeed some help.
@hamadalbasri9058
@hamadalbasri9058 Жыл бұрын
great vedio but why not translate ?!
KINDNESS ALWAYS COME BACK
00:59
dednahype
Рет қаралды 169 МЛН
Smart Sigma Kid #funny #sigma #comedy
00:26
CRAZY GREAPA
Рет қаралды 15 МЛН
How to download sequencing data from SRA NCBI | Bioinformatics 101
12:31
Bioinformagician
Рет қаралды 41 М.
GEO2R and Data Manipulation
24:20
Neuromatter
Рет қаралды 9 М.
9 R packages that EVERY Data Scientist must know (in 9-minutes)
9:26
Business Science
Рет қаралды 3,9 М.
Stanford's FREE data science book and course are the best yet
4:52
Python Programmer
Рет қаралды 681 М.