UniProt for Proteomics Scientists
53:22
Automated annotation in UniProt
37:04
Пікірлер
@duafatima6283
@duafatima6283 Ай бұрын
Hi, thank you for the video. Which proteins are more reliable to analyze out of reviewed or unreviewed sets?
@EMBL-EBI
@EMBL-EBI Ай бұрын
The reviewed dataset (UniProt/SwissProt) is a high quality manually annotated and non-redundant protein sequence database, which brings together experimental results, computed features and scientific conclusions. It contains protein sequences with evidence at the protein level. In SwissProt, each protein has been manually curated by expert curators based on: -Experiments described in peer-reviewed literature -Sequence and homology analysis The unreviewed dataset (UniProtKB/TrEMBL) contains many more sequences from various genome sequencing projects. TrEMBL contains high quality computationally analyzed records that are enriched with automatic annotation and classification. Sequences have not been manually reviewed by a curator and do not contain experimental annotations from literature. Annotations are based on automatic annotation systems that learn from SwissProt entries, such as UniRule and ARBA. Sequences may not have evidence at the protein level and some sequences may be incomplete (labeled as fragments). Ultimately, the choice between these datasets depends on the user's specific needs. Both the experimental-based annotations in SwissProt and the automatic annotation system in TrEMBL are considered reliable sources for protein feature annotations. SwissProt prioritizes accuracy and experimental validation, while TrEMBL offers a much larger dataset generated through automated methods.
@onesimemb102
@onesimemb102 Ай бұрын
The very relevant aspects for visualization, thank you for this training.
@temitayoogundimu6294
@temitayoogundimu6294 Ай бұрын
This is insightful.
@acrocent9788
@acrocent9788 2 ай бұрын
Even though a vcf can easily be above 50 mb, ensembl only keeps a 50 mb limit when using their vep, is there another platform that takes vcfs for pathogenicity analysis which can take more import data?
@EnsemblHelpdesk
@EnsemblHelpdesk Ай бұрын
Hi, thank you for the query. Ensembl VEP also recognises compressed (gzipped) input files. Alternatively, you can provide a URL to the file location if your input file is bigger than 50MB in size, and Ensembl VEP is also available via the REST API and the command-line. I hope that this helps!
@omarmziouka4072
@omarmziouka4072 2 ай бұрын
hello ! please How can I find motifs of a protein on UniProt ?
@EMBL-EBI
@EMBL-EBI 2 ай бұрын
The 'Family and Domains' section of a UniProt entry provides information on sequence similarities with other proteins and the domain(s) present in a protein. The information is filed in different subsections, such as domain, repeat region, coiled coil and motif. These protein features can also be visualized in the Feature Viewer of a protein entry. The feature viewer allows to see all sequence features together in a visual manner. Features are arranged into categories such as domains and sites, motifs, molecule processing, post-translational modifications, mutagenesis, etc. The ruler on top represents the sequence length of the protein. By clicking on a feature, a tooltip will be shown with information on the feature and also highlight the sequence position of the feature. We hope this information is helpful for you.
@vondhanaramesh4365
@vondhanaramesh4365 3 ай бұрын
Could you please let me know the detailed tutorial of chembl API
@EMBL-EBI
@EMBL-EBI 3 ай бұрын
Hi there, thanks for your comments and interest in ChEMBL. Can you please email your query to [email protected], where we can open a helpdesk ticket for you and share it with the team.
@vondhanaramesh4365
@vondhanaramesh4365 3 ай бұрын
Currently I'm working with schisostoma mansoni, thank you so much!
@SidwellMafisa
@SidwellMafisa 3 ай бұрын
I would like to cancel this nonsense as I didn't apply for it
@EMBL-EBI
@EMBL-EBI 3 ай бұрын
Hi there. We're unsure what you want to cancel here? Our webinar videos are uploaded for free to KZfaq after recording. Perhaps you get alerts whenever we upload videos? If that is what you didn't want to see anymore, you will need to check into your individual KZfaq settings as this isn't something we as an account can control. I hope that helps.
@alpdinc-oran6684
@alpdinc-oran6684 3 ай бұрын
great explanation. thank you for posting
@puitea_ralte
@puitea_ralte 4 ай бұрын
Thanks for sharing. I am Computer Science background, am doing PhD title on "An automated decision-making system to identify T2DM (type - II diabetes Miletus ) based on DNA sequences". i got data from which i have to figure it out diabetic variant from genomic dataset and i am stuck with it. i will be extremely glad if you can provides me some help in my research. Can you please drop some thing to contact you.
@student_remo
@student_remo 4 ай бұрын
Thank you for this series! ❤
@student_remo
@student_remo 4 ай бұрын
SRSF1 gene = Serine and Arginine Rich Splicing Factor 1.
@student_remo
@student_remo 4 ай бұрын
MT-CO1 gene = Mitochondrially Encoded Cytochrome C Oxidase I.
@student_remo
@student_remo 4 ай бұрын
I love the explanation of endosymbiont theory here. 🤍
@student_remo
@student_remo 4 ай бұрын
“This gene encodes an enzyme involved in blood pressure regulation and electrolyte balance. It catalyzes the conversion of angiotensin I into a physiologically active peptide angiotensin II. Angiotensin II is a potent vasopressor and aldosterone-stimulating peptide that controls blood pressure and fluid-electrolyte balance. This angiotensin converting enzyme (ACE) also inactivates the vasodilator protein, bradykinin.” - National Institutes of Health, USA.
@student_remo
@student_remo 4 ай бұрын
IL6 = interleukin 6. From “leukocyte” and the Greek language, leuk- “white”, cyt- “cell”.
@student_remo
@student_remo 4 ай бұрын
“Fat mass and obesity associated (FTO) was the first gene found to be associated with obesity in three independent genome-wide association studies.” -NIH USA Gene full name: FTO alpha-ketoglutarate dependent dioxygenase.
@student_remo
@student_remo 4 ай бұрын
“Cystic fibrosis is an inherited disease caused by mutations in a gene called the cystic fibrosis transmembrane conductance regulator (CFTR).” - National Institutes of Health, USA
@student_remo
@student_remo 4 ай бұрын
18S rRNA = 18S ribosomal RNA.
@student_remo
@student_remo 4 ай бұрын
“The TP53 gene provides instructions for making a protein called tumor protein p53 (or p53). This protein acts as a tumor suppressor, which means that it regulates cell division by keeping cells from growing and dividing (proliferating) too fast or in an uncontrolled way.” -MedlinePlus Genetics
@student_remo
@student_remo 4 ай бұрын
XIST gene: X inactive specific transcript.
@student_remo
@student_remo 4 ай бұрын
TTN is so big, what if we renamed “titin” into “titan”? 😅
@student_remo
@student_remo 4 ай бұрын
HBB: Hemoglobin subunit beta gene. 🌬🩸
@student_remo
@student_remo 4 ай бұрын
00:59 A gene is “a region of genome that makes a particular protein or functional RNA.”
@muhammednagas6311
@muhammednagas6311 5 ай бұрын
Was useful. Many thanks
@chidozienwanedo9234
@chidozienwanedo9234 5 ай бұрын
Nice presentation, Alex. Although I have got to work on coming to terms with the novel techniques you talked about. especially the use of drep on MAG. All the same, it was interesting to learn from you. Thanks!
@goodwork3980
@goodwork3980 6 ай бұрын
Great vidéo from Julia 🥰
@guihuajia7696
@guihuajia7696 6 ай бұрын
My targets are not from CHEMBL but in other sources with their identifiers. How can I convert those identifiers of hundreds of targets into CHEMBL IDs?
@EMBL-EBI
@EMBL-EBI 6 ай бұрын
Hi there, thanks for your comments and interest in ChEMBL. Can you please email your query to [email protected], where we can open a helpdesk ticket for you and share it with the team.
@guihuajia7696
@guihuajia7696 6 ай бұрын
pchembl_value__gte=5? under the threshold of: less than 10 um of potency? pchembl = - log10(10) =-1, if potence is great than 10 um ( ie. < 10 um), the pchembl should be >= -1. Right?
@guihuajia7696
@guihuajia7696 6 ай бұрын
I got it wrong, the unit should be in molar concentration. then the cutoff is: pchembl_value__gte=5.
@MaxHumbertoCautiQuilcaro
@MaxHumbertoCautiQuilcaro 6 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 🌱 *Dave Edwards, Director of the Center for Applied Bioinformatics at the University of Western Australia, discusses the intersection of pangenomics and machine learning for crop improvement.* 03:36 🌍 *The changing climate and growing global population are impacting agriculture. Shifts in rainfall patterns and temperature changes are affecting crop productivity, especially in food-insecure regions.* 05:42 🧬 *Genomics is crucial for improving crop productivity. Major crops need yield improvements and adaptation to climate change, while minor crops important for food security have great potential for improvement.* 08:58 🧬 *Sequencing technology has advanced significantly, becoming cheap and accessible. Next-generation sequencing and technologies like Oxford Nanopore and PacBio Sequel allow for cost-effective sequencing of diverse genomes.* 13:57 🌾 *Pangenomics involves understanding core genomes, variable genes, and dispensable genes in a species. A single reference genome doesn't represent the diversity, necessitating a pangenomic approach.* 16:51 🧩 *Building pan genomes involves an iterative assembly approach, utilizing a reference genome, mapping reads, assembling new contigs, and iteratively adding more data. Population graphs are now favored for their ability to capture more genomic information.* 18:29 📊 *Population graphs, especially in plant species, allow mapping data from hundreds or thousands of individuals to study genomic variation. They provide a comprehensive view of relationships between different parts of the genome.* 19:54 🧬 *Explored genomic diversity in Brassica species using pan-genomics, revealing significant variation in gene presence/absence.* 23:31 🧬 *Modeled genome sequencing to predict the number of genes in Brassica rapa, demonstrating the efficiency of capturing most genes with a relatively small number of individuals.* 25:51 🌱 *Identified disease resistance genes showing presence/absence variation in Brassica species, suggesting potential sources for crop improvement.* 26:19 🌾 *Explored Brassica napus (canola) pan-genome, highlighting substantial gene variation and the impact of polyploidy on gene redundancy.* 28:15 🤖 *Applied machine learning to understand gene loss mechanisms in Brassica species, revealing variable factors like chromosome position and homologous exchange.* 32:52 🌾 *Investigated wheat (bread wheat) pan-genome, emphasizing the limitations of using a single reference and the importance of pan-genomes for more accurate genomic studies.* 35:30 🌱 *Analyzed a soybean pan-genome with over a thousand individuals, uncovering gene frequency changes during domestication and breeding.* 37:39 🧬 *Explored reduction in gene content during domestication and breeding, indicating potential deleterious genes with no presence/absence variation that may be targeted using genome editing technologies.* 39:21 🌐 *Discussed the need for improved graph pan-genomes, data accessibility, and integration of diverse genomic information for more comprehensive analyses.* 40:06 🌾 *Machine learning can be applied to diverse data types in crop improvement, including crop images, genome sequences, and tabular data like yield statistics.* 41:29 🧠 *Multimodal deep learning involves building individual models for different data types (genomic variation, phenotype, environmental data) and combining them for predictions, allowing easier modification and fine-tuning.* 42:11 🌽 *Successful example: Using machine learning for yield prediction in Maize by analyzing drone images and manipulating them through rotation and other techniques.* 44:30 📊 *Classifying high-yielding lines early in crop development using machine learning, even without weather data, proves useful for breeders.* 45:13 🧬 *Machine learning and deep learning show promise in predicting traits in crops, with an example in soybean resequencing and the identification of important genomic loci.* 46:20 🌱 *Machine learning models, particularly XG Boost, aid in predicting gene content in canola, even for genes that are challenging to predict due to masking effects.* 47:04 🌾 *Quantitative disease resistance, such as blackleg in canola, can be predicted based on genotype, demonstrating the potential for machine learning in challenging scenarios.* 47:49 🔄 *Ongoing challenges and future directions include the need for better annotated pan-genome graphs, improved technology for building computational-efficient graphs, and the development of more advanced machine learning models for diverse data types.* 48:57 💻 *Collaborating with breeding companies and optimizing the path to breeding improved crops using bioinformatics is essential, emphasizing the importance of more data accessibility and usability.* 49:25 🌍 *Acknowledgment of the urgency in addressing climate change impacts on agriculture, highlighting the need for continuous innovation and collaboration in crop improvement efforts.* Made with HARPA AI
@keeperscoffin
@keeperscoffin 7 ай бұрын
Terrific information! Thank you.
@george_anak_lihi_blog_hot
@george_anak_lihi_blog_hot 7 ай бұрын
😮😮😮
@bahaddinahmad5823
@bahaddinahmad5823 7 ай бұрын
Dream of working there?
@ibtissammaslouh3540
@ibtissammaslouh3540 8 ай бұрын
Is ttn gene really fatal for newborns especially ?
8 ай бұрын
Thank you so much
@kruthiiirao
@kruthiiirao 8 ай бұрын
👍
@TheLuikartLab
@TheLuikartLab 8 ай бұрын
Is there a good tutorial on how to analyze/visualize .raw files for people without proteomic experience?
@DeeptiJaiswal23
@DeeptiJaiswal23 8 ай бұрын
Unfortunately we do not have any tutorial on visualisation of RAW files.
@marwatawfik3956
@marwatawfik3956 8 ай бұрын
Anyway to get technical support to get my 16S datasets? i find that difficult to follow.
@marwatawfik3956
@marwatawfik3956 8 ай бұрын
Thanks for the webinar.
@ArjunSingh-mv4es
@ArjunSingh-mv4es 8 ай бұрын
Can we annotate our fungus whole genome sequence here
@EMBL-EBI
@EMBL-EBI 8 ай бұрын
Thank you for the comment. Rfam can be used to annotate all fungal genomes, and we have some documentation here: docs.rfam.org/en/latest/genome-annotation.html, but you will have to run everything locally. We hope that helps!
@marwatawfik3956
@marwatawfik3956 8 ай бұрын
How to upload my 16S dataset?
@EMBL-EBI
@EMBL-EBI 8 ай бұрын
Hi there and thank you for your question. Data to be analysed by MGnify will need to be submitted to the European Nucleotide Archive first. You will then be able to submit an analysis request to MGnify via the website by clicking 'Submit and/or Request' on the homepage. We hope that helps. Thanks again for engaging with our content.
@marwatawfik3956
@marwatawfik3956 8 ай бұрын
emailed you as I think I need to be provided with Webin credentials @@EMBL-EBI
@thesuprememat7119
@thesuprememat7119 9 ай бұрын
good video
@marijager700
@marijager700 9 ай бұрын
Thanks for the presentation! I've been doing GO and other functional analysis of proteomic data for several years, and still came by several useful tips and tricks in this video :).
@EMBL-EBI
@EMBL-EBI 9 ай бұрын
That's great to hear, thanks for sharing!
@jayasuriyajm3029
@jayasuriyajm3029 9 ай бұрын
@18.27 isn't specific parents gives to broader children, since arrow is pointing towards biological processes?
@EMBL-EBI
@EMBL-EBI 9 ай бұрын
Thanks for the comment. As you go up the tree the terms become less specific. ‘Biological process’, the root node, is the least specific term in that branch of the Gene Ontology. We hope that helps.
@bbarry083
@bbarry083 9 ай бұрын
Greetings from Ethiopia 🇪🇹
@user-ru7rc6om1x
@user-ru7rc6om1x 9 ай бұрын
unclear accent
@kellihendrix9637
@kellihendrix9637 10 ай бұрын
💘 "Promo SM"
@keenviewer
@keenviewer 10 ай бұрын
Very informative - thank you. I have had success the conda package for VEP: conda create -n VEP109 conda activate VEP109 conda install ensembl-vep=109.3 (latest at time of installation) conda install perl-compress-raw-zlib=2.202 An additional step was required (suggested during installation of the above) to install cache data. Here I installed human GCRh38: vep_install -a cf -s homo_sapiens -y GRCh38 -c ~/.conda/envs/VEP109/ ~/.conda/envs/VEP109/GRCh38/ --CONVERT --PLUGINS all
10 ай бұрын
Thank you so much. Greetings from Molecular Biology, Environment and Cancer Research Group at Universidad del Cauca, Colombia.
11 ай бұрын
Thank you EMBL - EBI for this useful video. Greetings from a bioeng graduate student.
@EMBL-EBI
@EMBL-EBI 9 ай бұрын
Thank you for the lovely comment!