Word Embedding and Word2Vec, Clearly Explained!!!

  Рет қаралды 255,346

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Words are great, but if we want to use them as input to a neural network, we have to convert them to numbers. One of the most popular methods for assigning numbers to words is to use a Neural Network to create Word Embeddings. In this StatQuest, we go through the steps required to create Word Embeddings, and show how we can visualize and validate them. We then talk about one of the most popular Word Embedding tools, word2vec. BAM!!!
Note, this StatQuest assumes that you are already familiar with...
The Basics of how Neural Networks Work: • The Essential Main Ide...
The Basics of how Backpropagation Works: • Neural Networks Pt. 2:...
How the Softmax function works: • Neural Networks Part 5...
How Cross Entropy works: • Neural Networks Part 6...
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
KZfaq Membership: / @statquest
...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/statquest-store/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
4:25 Building a Neural Network to do Word Embedding
8:18 Visualizing and Validating the Word Embedding
10:42 Summary of Main Ideas
11:44 word2vec
13:36 Speeding up training with Negative Sampling
#StatQuest #word2vec

Пікірлер: 460
@statquest
@statquest Жыл бұрын
To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/ NOTE: A lot of people ask for the math at 13:16 to be clarified. In that example we have 3,000,000 inputs, each connected to 100 activation functions, for a total of 300,000,000 weights on the connections from the inputs to the activation functions. We then have another 300,000,000 weights on the connections from activations functions to the outputs. 300,000,000 + 300,000,000 = 2 * 300,000,000
@karanacharya18
@karanacharya18 23 күн бұрын
In simple words, word embeddings is the by-product of training a neural network to predict the next word. By focusing on that single objective, the weights themselves (embeddings) can be used to understand the relationships between the words. This is actually quite fantastic! As always, great video @statquest!
@statquest
@statquest 23 күн бұрын
bam! :)
@joeybasile1572
@joeybasile1572 11 күн бұрын
Not necessarily just the next word. Your statement is specific.
@rishavkumar8341
@rishavkumar8341 Жыл бұрын
Probably the most important concept in NLP. Thank you explaining it so simply and rigorously. Your videos are a thing of beauty!
@statquest
@statquest Жыл бұрын
Wow, thank you!
@exxzxxe
@exxzxxe 3 ай бұрын
Josh; this is the absolutely clearest and most concise explanation of embeddings on KZfaq!
@statquest
@statquest 3 ай бұрын
Thank you very much!
@davins90
@davins90 2 ай бұрын
totally agree
@HarpitaPandian
@HarpitaPandian 5 ай бұрын
Can't believe this is free to watch, your quality content really helps people develop a good intuition about how things work!
@statquest
@statquest 5 ай бұрын
Thanks!
@rachit7185
@rachit7185 Жыл бұрын
This channel is literally the best thing happened to me on youtube! Way too excited for your upcoming video on transformers, attention and LLMs. You're the best Josh ❤
@statquest
@statquest Жыл бұрын
Wow, thanks!
@MiloLabradoodle
@MiloLabradoodle Жыл бұрын
Yes, please do a video on transformers. Great channel.
@statquest
@statquest Жыл бұрын
@@MiloLabradoodle I'm working on the transformers video right now.
@liuzeyu3125
@liuzeyu3125 Жыл бұрын
@@statquest Can't wait to see it!
@SergioPolimante
@SergioPolimante 4 ай бұрын
Statquest is by far the best machine learning Chanel on KZfaq to learn the basic concepts. Nice job
@statquest
@statquest 4 ай бұрын
Thank you!
@yuxiangzhang2343
@yuxiangzhang2343 9 ай бұрын
So good!!! This is literally the best deep learning tutorial series I find… after a very long search on the web!
@statquest
@statquest 9 ай бұрын
Thank you! :)
@harin01737
@harin01737 8 ай бұрын
I was struggling to understand NLP and DL concepts, thinking of dropping my classes, and BAM!!! I found you, and now I'm writing a paper on neural program repair using DL techniques.
@statquest
@statquest 8 ай бұрын
BAM! :)
@tanbui7569
@tanbui7569 9 ай бұрын
Damn, when I first learned about this 4 years ago, it took me two days to wrap my head around to understand these weights and embeddings to implement in codes. Just now, I need to refreshe myself the concepts since I have not worked with it in a while and your videos illustrated what I learned (whole 2 days in the past) in just 16 minutes !! I wished this video existed earlier !!
@statquest
@statquest 9 ай бұрын
Thanks!
@dreamdrifter
@dreamdrifter 11 ай бұрын
Thank you Josh, this is something I've been meaning to wrap my head around for a while and you explained it so clearly!
@statquest
@statquest 11 ай бұрын
Glad it was helpful!
@mannemsaisivadurgaprasad8987
@mannemsaisivadurgaprasad8987 6 ай бұрын
On of the best videos I've seen till now regarding Embeddings.
@statquest
@statquest 6 ай бұрын
Thank you!
@pichazai
@pichazai 21 күн бұрын
this channel is the best resource of ML in the entire internet
@statquest
@statquest 21 күн бұрын
Thank you!
@FullStackAmigo
@FullStackAmigo Жыл бұрын
Absolutely the best explanation that I've found so far! Thanks!
@statquest
@statquest Жыл бұрын
Thank you! :)
@wizenith
@wizenith Жыл бұрын
haha I love your opening and your teaching style! when we think something is extremely difficult to learn, everything should begin with singing a song, that make a day more beautiful to begin with ( heheh actually I am not just teasing lol, I really like that ) thanks for sharing your thoughts with us
@statquest
@statquest Жыл бұрын
Thanks!
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
This is the best explanation of word embedding I have come across.
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@awaredz007
@awaredz007 29 күн бұрын
Wow!! This is the best definition I have ever heard or seen, of word embedding. Right at 09:35. Thanks for the clear and awesome video. You guy rock!!
@statquest
@statquest 29 күн бұрын
Thanks! :)
@TropicalCoder
@TropicalCoder 8 ай бұрын
That was the first time I actually understood embeddings - thanks!
@statquest
@statquest 8 ай бұрын
bam! :)
@rathinarajajeyaraj1502
@rathinarajajeyaraj1502 Жыл бұрын
This is one of the best sources of information.... I always find videos a great source of visual stimulation... thank you.... infinite baaaam
@statquest
@statquest Жыл бұрын
BAM! :)
@haj5776
@haj5776 Жыл бұрын
The phrase "similar words will have similar numbers" in the song will stick with me for a long time, thank you!
@statquest
@statquest Жыл бұрын
bam!
@muthuaiswaryaaswaminathan4079
@muthuaiswaryaaswaminathan4079 6 ай бұрын
Thank you so much for this playlist! Got to learn a lot of things in a very clear manner. TRIPLE BAM!!!
@statquest
@statquest 6 ай бұрын
Thank you! :)
@acandmishra
@acandmishra Ай бұрын
your work is extremely amazing and so helpful for new learns who want to go into detail of working of Deep Learning models , instead of just knowing what they do!! Keep it up!
@statquest
@statquest Ай бұрын
Thanks!
@chad5615
@chad5615 11 ай бұрын
Keep up the amazing work (especially the songs) Josh, you're making live easy for thousands of people !
@statquest
@statquest 11 ай бұрын
Wow! Thank you so much for supporting StatQuest! TRIPLE BAM!!!! :)
@mamdouhdabjan9292
@mamdouhdabjan9292 Жыл бұрын
Hey Josh. A great new series that I, and many others, would be excited to see is bayesian statistics. Would love to watch you explain the intricacies of that branch of stats. Thanks as always for the great content and keep up with the neural-network related videos. They are especially helpful.
@statquest
@statquest Жыл бұрын
That's definitely on the to-do list.
@mamdouhdabjan9292
@mamdouhdabjan9292 Жыл бұрын
@@statquest looking forward to it.
@ah89971
@ah89971 8 ай бұрын
When I watched this,I have only one question which is why all the others failed to explain this if they are fully understood the concept?
@statquest
@statquest 8 ай бұрын
bam!
@rudrOwO
@rudrOwO 5 ай бұрын
@@statquest Double Bam!
@meow-mi333
@meow-mi333 5 ай бұрын
Bam the bam!
@ananpinya835
@ananpinya835 Жыл бұрын
StatQuest is great! I learn a lot from your channel. Thank you very much!
@statquest
@statquest Жыл бұрын
Glad you enjoy it!
@user-qc5uk6ei2m
@user-qc5uk6ei2m 8 ай бұрын
Hey Josh, i'm a brazilian student and i love to see your videos, it's such a good and fun to watch explanation of every one of the concepts, i just wanted to say thank you, cause in the last few months you made me smile beautiful in the middle of studying, so, thank you!!! (sorry for the bad english hahaha)
@statquest
@statquest 8 ай бұрын
Muito obrigado!!! :)
@mycotina6438
@mycotina6438 Жыл бұрын
BAM!! StatQuest never lie, it is indeed super clear!
@statquest
@statquest Жыл бұрын
Thank you! :)
@flow-saf
@flow-saf 6 ай бұрын
This video explains the source of the multiple dimensions in a word embedding, in the most simple way. Awesome. :)
@statquest
@statquest 6 ай бұрын
Thanks!
@lfalfa8460
@lfalfa8460 5 ай бұрын
I love all of your songs. You should record a CD!!! 🤣 Thank you very much again and again for the elucidating videos.
@statquest
@statquest 5 ай бұрын
Thanks!
@user-eq9cf4mt2s
@user-eq9cf4mt2s Ай бұрын
Great presentation, You saved my day after watching several videos, thank you!
@statquest
@statquest Ай бұрын
Glad it helped!
@gustavow5746
@gustavow5746 7 ай бұрын
the best video I saw about this topic so far. Great Content! Congrats!!
@statquest
@statquest 7 ай бұрын
Wow, thanks!
@RaynerGS
@RaynerGS 7 ай бұрын
I admire your work a lot. Salute from Brazil.
@statquest
@statquest 7 ай бұрын
Muito obrigado! :)
@channel_SV
@channel_SV Жыл бұрын
It's so nice to google and realize that there is a StatQuest about your question, when you are certain of that there hadn't been one some time before
@statquest
@statquest Жыл бұрын
BAM! :)
@mahdi132
@mahdi132 9 ай бұрын
Thank you sir. Your explanation is great and your work is much appreciated.
@statquest
@statquest 9 ай бұрын
Thanks!
@exxzxxe
@exxzxxe 2 ай бұрын
Hopefully everyone following this channel has Josh's book. It is quite excellent!
@statquest
@statquest 2 ай бұрын
Thanks for that!
@fouadboutaleb4157
@fouadboutaleb4157 8 ай бұрын
Bro , i have my master degree in ML but trust me you explain it better than my teachers ❤❤❤ Big thanks
@statquest
@statquest 8 ай бұрын
Thank you very much! :)
@michaelcheung6290
@michaelcheung6290 Жыл бұрын
Thank you statquest!!! Finally I started to understand LSTM
@statquest
@statquest Жыл бұрын
Hooray! BAM!
@ramzirebai3661
@ramzirebai3661 Жыл бұрын
Thank you so much Mr.Josh Starmer, you are the only one that makes ML concepts easy to understand Can you , please , explain Glove ?
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@p-niddy
@p-niddy 11 ай бұрын
Great video! One suggestion is that you could expand on the Negative Sampling discussion by explaining how it chooses purposely unrelated (non-context) words to increase the model's accuracy in predicting related (context) words of the target word.
@statquest
@statquest 11 ай бұрын
It actually doesn't purposely select unrelated words. It just selects random words and hopes that the vocabulary is large enough that the probability that the words are unrelated will be relatively high.
@vpnserver407
@vpnserver407 11 ай бұрын
highly valuable video and book tutorial, thanks for putting this kind of special tuts out here .
@statquest
@statquest 11 ай бұрын
Glad you liked it!
@manuelamankwatia6556
@manuelamankwatia6556 Ай бұрын
This is by far the best video on embeddings. A while university corse is broken down in 15minutes
@statquest
@statquest Ай бұрын
Thanks!
@denismarcio
@denismarcio 2 ай бұрын
Extremamente didático! Parabéns.
@statquest
@statquest 2 ай бұрын
Muito obrigado! :)
@eamonnik
@eamonnik Жыл бұрын
Hey Josh! Loved seeing your talk at BU! Appreciate your videos :)
@statquest
@statquest Жыл бұрын
Thanks so much! :)
@m3ow21
@m3ow21 11 ай бұрын
I love the way you teach!
@statquest
@statquest 11 ай бұрын
Thanks!
@danish5326
@danish5326 8 ай бұрын
Thanks for enlightening us Master.
@statquest
@statquest 8 ай бұрын
Any time!
@bancolin1005
@bancolin1005 Жыл бұрын
BAM! Thanks for your video, I finally realize what the negative sampling means ~
@statquest
@statquest Жыл бұрын
Happy to help!
@alfredoderodt6519
@alfredoderodt6519 9 ай бұрын
You are a beautiful human! Thank you so much for this video! I was finally able to understand this concept! Thanks so much again!!!!!!!!!!!!! :)
@statquest
@statquest 9 ай бұрын
Glad it was helpful!
@wenqiangli7544
@wenqiangli7544 Жыл бұрын
Great video for explaining word2vec!
@statquest
@statquest Жыл бұрын
Thanks!
@avishkaravishkar1451
@avishkaravishkar1451 5 ай бұрын
For those of you who find it hard to understand this video, my recommendation is to watch it at a slower pace and make notes of the same. It will really make things much more clear.
@statquest
@statquest 5 ай бұрын
0.5 speed bam!!! :)
@wellwell8025
@wellwell8025 Жыл бұрын
Way better than my University slides. Thanks
@statquest
@statquest Жыл бұрын
Thanks!
@ColinTimmins
@ColinTimmins 8 ай бұрын
Thank you so much for these videos. It really helps with the visuals because I am dyslexic… Quadruple BAM!!!! lol 😊
@statquest
@statquest 8 ай бұрын
Happy to help!
@janapalaswathi4262
@janapalaswathi4262 3 ай бұрын
Awesome explanation..
@statquest
@statquest 3 ай бұрын
Thanks!
@saisrisai9649
@saisrisai9649 5 ай бұрын
Thank you Statquest!!!!
@statquest
@statquest 5 ай бұрын
Any time!
@pedropaixaob
@pedropaixaob 4 ай бұрын
This is an amazing video. Thank you!
@statquest
@statquest 4 ай бұрын
Thanks!
@yasminemohamed5157
@yasminemohamed5157 Жыл бұрын
Awesome as always. Thank you!!
@statquest
@statquest Жыл бұрын
Thank you! :)
@LakshyaGupta-ge3wj
@LakshyaGupta-ge3wj 6 ай бұрын
Absolutely mind blowing and amazing presentation! For the Word2Vec's strategy for increasing context, does it employ the 2 strategies in "addition" to the 1-Output-For-1-Input basic method we talked about in the whole video or are they replacements? Basically, are we still training the model on predicting "is" for "Gymkata" in the same neural network along with predicting "is" for a combination of "Gymkata" and "great"?
@statquest
@statquest 6 ай бұрын
Word2Vec uses one of the two strategies presented at the end of the video.
@tupaiadhikari
@tupaiadhikari 10 ай бұрын
Great Explanation. Please make a video on how do we connect the output of an Embedding Layer to an LSTM/GRU for doing classification for say Sentiment Analysis
@statquest
@statquest 10 ай бұрын
I show how to connect it to an LSTM for language translation here: kzfaq.info/get/bejne/gp54ftqWv6-znZs.html
@tupaiadhikari
@tupaiadhikari 10 ай бұрын
@@statquest Thank You Professor Josh !
@minhmark.01
@minhmark.01 Ай бұрын
thanks for your tutorial!!!
@statquest
@statquest Ай бұрын
You're welcome!
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
It would also be nice to have a video about the difference between LM (linear regression models) and GLM (Generalized Linear Models). I know they're different but don't quite understand thAT when interpreting them or programming them in R. THAAANKS!
@statquest
@statquest Жыл бұрын
Linear models are just models based on linear regression and I describe them here in this playlist: kzfaq.info/sun/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU Generalized Linear Models is more "generalized" and includes Logistic Regression kzfaq.info/sun/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe and a few other methods that I don't talk about like Poisson Regression.
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
@@statquest Thanks Josh!! I'll watch them all 🤗
@phobiatheory3791
@phobiatheory3791 Жыл бұрын
Hi, I love your videos! They're really well explained. Could you please make a video on partial least squares (PLS)
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@study-tp4ts
@study-tp4ts Жыл бұрын
Great video as always!
@statquest
@statquest Жыл бұрын
Thanks again!
@pakaponwiwat2405
@pakaponwiwat2405 8 ай бұрын
Wow, Awesome. Thank you so much!
@statquest
@statquest 8 ай бұрын
You're very welcome!
@pushkar260
@pushkar260 Жыл бұрын
That was quite informative
@statquest
@statquest Жыл бұрын
BAM! Thank you so much for supporting StatQuest!!! :)
@auslei
@auslei 11 ай бұрын
Love this channel.
@statquest
@statquest 11 ай бұрын
Glad to hear it!
@ishaqpaktinyar7766
@ishaqpaktinyar7766 3 ай бұрын
you da bessssst, saved me alota time and confusion :..)
@statquest
@statquest 3 ай бұрын
Thanks!
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
Please make a video about the metrics for prediction performance: RMSE, MAE and R SQUARED. 🙏🏼🙏🏼🙏🏼 YOURE THE BEST!
@statquest
@statquest Жыл бұрын
The first video I ever made is on R-squared: kzfaq.info/get/bejne/aKeBftColprReIE.html NOTE: Back then I didn't know about machine learning, so I only talk about R-squared in the context of fitting a straight line to data. In that context, R-squared can't be negative. However, with other machine learning algorithms, it is possible.
@c.nbhaskar4718
@c.nbhaskar4718 Жыл бұрын
great stuff as usual ..BAM * 600 million
@statquest
@statquest Жыл бұрын
Thank you so much! :)
@aniketsakpal4969
@aniketsakpal4969 Жыл бұрын
Just incredible!
@statquest
@statquest Жыл бұрын
Thank you!
@AliShafiei-ui8tn
@AliShafiei-ui8tn 10 ай бұрын
the best channel ever.
@statquest
@statquest 10 ай бұрын
Double bam! :)
@MaskedEngineerYH
@MaskedEngineerYH Жыл бұрын
Keep going statquest!!
@statquest
@statquest Жыл бұрын
That's the plan!
@familywu3869
@familywu3869 Жыл бұрын
Thank you very much for your excellent tutorials! Josh. Here I have a question, at around 13:30 of this video tutorial, you mentioned to multiply by 2. I am not sure why 2? I mean if there are more than 2 outputs, will we multiply the number of output nodes, instead of 2? Thank you for your clarification in advance.
@statquest
@statquest Жыл бұрын
If we have 3,000,000 words and phrases as inputs, and each input is connected to 100 activation functions, then we have 300,000,000 weights going from the inputs to the activation function. Then from those 100 activation function, we have 3,000,000 outputs (one per word or phrase), each with a weight. So we have 300,000,000 weights on the input side, and 300,000,000 weights on the output side, or a total of 600,000,000 weights. However, since we always have the same number of weights on the input and output sides, we only need to calculate the number of weights on one side and then just multiply that number by 2.
@surojit9625
@surojit9625 9 ай бұрын
@@statquest Thanks for explaining! I also had the same question.
@jwilliams8210
@jwilliams8210 5 ай бұрын
Ohhhhhhhhh! I missed that the first time around! BTW: (Stat)Squatch and Norm are right: StatQuest is awesome!!
@meguellatiyounes8659
@meguellatiyounes8659 Жыл бұрын
My favourite topic its magic. Bam!!
@statquest
@statquest Жыл бұрын
:)
@user-ck3qk5ce9k
@user-ck3qk5ce9k 4 ай бұрын
Can you do GloVe? i really enjoyed Word2Vec it will be great to see how GloVe works...how factorization based method works. Thank you for this amazing content!
@statquest
@statquest 4 ай бұрын
I'll keep that in mind.
@NewMateo
@NewMateo Жыл бұрын
Great vid. So your going to do a vid on transformer architectures? That would be incredible if so. Btw bought your book. Finished it in like 2 weeks. Great work on it!
@statquest
@statquest Жыл бұрын
Thank you! My video on Encoder-Decoders will come out soon, then Attention, then Transformers.
@thomasstern6814
@thomasstern6814 Жыл бұрын
@@statquest When the universe needs you most, you provide
@user-bd2fm9lk5b
@user-bd2fm9lk5b 5 ай бұрын
Thank you Josh for this great video. I have a quick question about the Negative Sampling: If we only want to predict A, why do we need to keep the weights for "abandon" instead of just ignoring all the weights except for "A"?
@statquest
@statquest 5 ай бұрын
If we only focused on the weights for "A" and nothing else, then training would cause all of the weights to make every output = 1. In contrast, by adding some outputs that we want to be 0, training is forced to make sure that not every single output gets a 1.
@CaHeoMapMap
@CaHeoMapMap Жыл бұрын
so goooood! Thank alot!
@statquest
@statquest Жыл бұрын
Glad you like it!
@jayachandrarameshkalakutag7329
@jayachandrarameshkalakutag7329 6 ай бұрын
Hi josh firstly thank you for all your videos. I had one doubt , in skip gram what will be the loss function on which the network is been optimized, in CBOW i can see that cross entropy is enough
@statquest
@statquest 6 ай бұрын
I believe it's cross entropy in both.
@neemo8089
@neemo8089 8 ай бұрын
Thank you so much for the video! I have one question, at 15:09, why we only need to optimize 300 steps? For one word with 100 * 2 weights? not sure how to understand the '2' as well.
@statquest
@statquest 8 ай бұрын
At 15:09 there are 100 weights going from the word "aardvark" to the 100 activation functions in the hidden layer. There are then 100 weights going from the activation functions to the sum for the word "A" and 100 weights going from the activation functions to the sum for the word "abandon". Thus, 100 + 100 + 100 = 300.
@neemo8089
@neemo8089 8 ай бұрын
Thank you!@@statquest
@lancezhang892
@lancezhang892 6 ай бұрын
Hello Josh, thanks for your video.May I know if we could use 3 neuron network to predict the next words?
@statquest
@statquest 6 ай бұрын
Sure
@smooth7041
@smooth7041 Жыл бұрын
Hello. Thank you very much. Great, great video. I have a question. In the negative sampling procedure we never use A = 1 as input at any step in the training process. I am wondering about the time the embeddings for A are trained. I can see how the weights for A at the right of the activation functions are trained, but not for the weights at the left. I can see that because we use a lot of training steps, in some moment A will be a word we don't want to predict at the input; therefore the embeddings for A will change, however, the prediction won't be A for those steps.
@statquest
@statquest Жыл бұрын
Why would we never use "A = 1" in training?
@MadeyeMoody492
@MadeyeMoody492 Жыл бұрын
Great video! Was just wondering why the output of the softmax activation at 10:10 are just 1 and 0s. Wouldn't that only be the case if we applied ArgMax here not SoftMax?
@statquest
@statquest Жыл бұрын
In this example the data set is very small and, for example, the word "is" is always followed by "great", every single time. In contrast, if we had a much larger dataset, then the word "is" would be followed by a bunch of words (like "great", or "awesome" or "horrible", etc) and not followed by a bunch of other words (like "ate", or "stand", etc). In that case, the soft max would tells which words had the highest probability of following is and we wouldn't just get 1.0 for a single word that could follow the word 'is'.
@MadeyeMoody492
@MadeyeMoody492 Жыл бұрын
@@statquest Ohh ok, that clears it up. Thanks!!
@exxzxxe
@exxzxxe 3 ай бұрын
You ARE the Batman and Superman of machine learning!
@statquest
@statquest 3 ай бұрын
:)
@SousanTarahomi-vh2jp
@SousanTarahomi-vh2jp 5 ай бұрын
Thanks!
@statquest
@statquest 5 ай бұрын
Hooray!!! Thank you so much for supporting StatQuest!!! TRIPLE BAM! :)
@guillaumebarreau
@guillaumebarreau 8 ай бұрын
Hi Josh, thank you for your excellent work! Just discovered your videos and consuming like a pack of crisps. I was wondering about the desired output when using the skip-gram model. When we have a word as input, the desired output is to have all the words found within the window size on any sentence of the corpus activate to 1 at the same time on the output layer, right? It is not said explicitly but I guess it is the only way it can be.
@statquest
@statquest 8 ай бұрын
The outputs from a softmax function are all between 0 and 1 and add up to 1. In other words, softmax function does not allow more than one output to have a value of 1. See 12:16 for an example of outputs for the skipgram method.
@guillaumebarreau
@guillaumebarreau 8 ай бұрын
@@statquest, thanks for your prompt reply! You are right, I didn't look carefully enough. I guess I got confused because after watching the video, I read other sources which seem to consider every skip-gram pair as a separate training example, which confused me.
@S.A_1992
@S.A_1992 4 ай бұрын
Thank you so much for this video. Could you do something like this for audio embedding as well? or how could we merge (do fusion) audio and text embedding? I really appreciate it.
@statquest
@statquest 4 ай бұрын
Unfortunately, I'm not familiar with audio embedding.
@lancezhang892
@lancezhang892 6 ай бұрын
If we use softmax function as activation function, in the last step whether should we use entropy loss function with prediction value y_head and label value y=1 to get the loss function value ?And then use backpropagation to optimize the weights?
@statquest
@statquest 6 ай бұрын
We use the cross entropy loss with the softmax.
@shamshersingh9680
@shamshersingh9680 Ай бұрын
Hi Josh, again the best explanation for the concept. However, I have a doubt. As per the explanation, word-embeddings are the weights associated with each word between the input and activation function layer. These weights are obtained after training on large text corpus like wikipedia. When I train another model using these embeddings on another set of data, the weights (embeddings) will change during back-propagation while training. So the embeddings will not remain same and change with every model we train. Is it correct interpretation or I am missing something here.
@statquest
@statquest Ай бұрын
When you build a neural network, you can specify which weights are trainable and which should be left as is. This is the basis of "fine-tuning" a model - just training specific weights rather than all of them. So, you can do that. Or you, you can just start from scratch - don't pre-train the word embeddings, but train them when you train everything else. This is what most large language models, like ChatGPT, do.
@BalintHorvath-mz7rr
@BalintHorvath-mz7rr 2 ай бұрын
Awesome video! This time, I feel I miss one step through. Namely, how do you train this network? I mean, I get that we want the network as such that similar words have similar embeddings. But what is the 'Actual' we use in our loss function to measure the difference from and use backpropagation with?
@statquest
@statquest 2 ай бұрын
Yes
@balintnk
@balintnk 2 ай бұрын
@@statquest haha I feel like I didn't ask the question well :D How would the network know, without human input, that Troll 2 and Gymkata is very similar and so it should optimize itself so that ultimately they have similar embeddings? (What "Actual" value do we use in the loss function to calculate the residual?)
@statquest
@statquest 2 ай бұрын
@@balintnk We just use the context that the words are used in. Normal backpropagation plus the cross entropy loss function where we use neighboring words to predict "troll 2" and "gymkata" is all you need to use to get similar embedding values for those. That's what I used to create this video.
@user-rj6wc7bm8x
@user-rj6wc7bm8x Жыл бұрын
That's awesome! But how would the multilingual word2vec be trained? Would the training dataset simply include corpus of two (or more) languages? or would additional NN infrastructure be required?
@statquest
@statquest Жыл бұрын
Are you asking about something that can translate one language to another? If so, then, yes, additional infrastructure is needed and I'll describe it in my next video in this series (it's called "sequence2sequence").
@user-rj6wc7bm8x
@user-rj6wc7bm8x Жыл бұрын
@@statquest not exactly, it's more like having similar words from multiple languages to be mapped within the same vector spaces. so for example King and "King" in French, German and Spanish - would appear to be the same.
@statquest
@statquest Жыл бұрын
@@user-rj6wc7bm8x Hmmm.. I'm not sure how that would work because the the english word "king" and the Spanish translation, "rey", would be in different contexts (For example, the english "king" would be in a phrase "all hail the king", and the spanish version would be in a sentence that had completely different words (even if they meant the same thing).
@gabrielrochasantana
@gabrielrochasantana 2 ай бұрын
Amazing lecture, congrats. The audio was also made from an NPL (Natural Language Processing), right?
@statquest
@statquest 2 ай бұрын
The translated overdubs were.
@kimsobota1324
@kimsobota1324 6 ай бұрын
I appreciate the knowledge you've just shared. It explains many things to me about neural networks. I have a question though, If you are randomly assigning a Value to a word, why not try something easier? For example, In Hebrew, each of the letters of the Alef - Bet is assigned a value. these values are added together to form a sum of a word. It is the context of the word, in a sentence that forms the block. Sabe? Take a look at Gamatra, Hewbew has been doing this for thousands of years. Just a thought.
@statquest
@statquest 6 ай бұрын
Would that method result in words used in similar contexts to have similar numbers? Does it apply to other languages? Other symbols? And can we end up with multiple numbers per symbol to reflect how it can be used or modified in different contexts?
@kimsobota1324
@kimsobota1324 6 ай бұрын
I wish I could answer that question better than to tell you context is EVERYTHING in Hebrew, a language that has but doesn't use vowels, since all who use the language understand the consonant-based word structures. Not only that, but in the late 1890s Rabbis from Ukraine and Azerbaijan developed a mathematical code that was used to predict word structures from the Torah that were accurate to a value of 0.001%. Others have tried to apply it to other books like Alice in Wonderland and could not duplicate the result. You can find more information on the subject through a book called, The Bible Code, which gives much more information as well as the formuli the Jewish Mathameticians created. While it is a poor citation, I have included this Wikipedia link: en.wikipedia.org/wiki/Bible_code#:~:text=The%20Bible%20code%20(Hebrew%3A%20%D7%94%D7%A6%D7%95%D7%A4%D7%9F,has%20predicted%20significant%20historical%20events. The book is available on Amazon if you find it peaks your interest. Please let me know if this helps. @@statquest
@kimsobota1324
@kimsobota1324 5 ай бұрын
@starquest, I had not heard from you about the Wiki?
@hasansoufan
@hasansoufan 11 ай бұрын
Thanks ❤
@statquest
@statquest 11 ай бұрын
:)
@robott12
@robott12 Жыл бұрын
Fantastic video! How do you apply in powerpoint the style of "pencil-written" boxes?
@statquest
@statquest Жыл бұрын
I use Keynote, and it's one of the default line types.
@robott12
@robott12 Жыл бұрын
@@statquest Thanks!
@lexxynubbers
@lexxynubbers Жыл бұрын
Machine learning explained like Sesame Street is exactly what I need right now.
@statquest
@statquest Жыл бұрын
bam!
@Rex389
@Rex389 Жыл бұрын
Hi Josh, great video. I have one question, how are the 2-20 words selected for being dropped while doing negative sampling
@statquest
@statquest Жыл бұрын
This is answered at 13:44. We can pick a random set because the assumption is that when the vocabulary is large, the chances of selecting a similar word are small. And I believe you select a different subset each iteration, so even if you do pick a similar word, the long term effects will not be huge.
@Rex389
@Rex389 Жыл бұрын
@@statquest Got it. Thanks
@user-pd1gy8xh4y
@user-pd1gy8xh4y 8 ай бұрын
funny and very nicely explained.
@statquest
@statquest 8 ай бұрын
Thanks! 😃
@fernandofa2001
@fernandofa2001 Жыл бұрын
I'm not sure if I understood correctly. Have those millions of word embeddings been preprocessed and are public? Or ar they dependent on context? I need to do a project on word clustering of movie genres and I'm not sure if this is my way to go. Any help is appreciated!
@statquest
@statquest Жыл бұрын
I'm not sure this is the way to go either - these specific embeddings are usually used for processing natural language. However, you can download some publicly available embeddings here: fasttext.cc/docs/en/crawl-vectors.html
@kamashay1
@kamashay1 Жыл бұрын
Hi -in the NN we state that we use the 'identity' as an activation function - this means that the network is equivalent to a linear model? what is the justification of doing that? what would happen if we would use different activation functions?
@statquest
@statquest Жыл бұрын
Yes, the identify function makes this just simple regression. As for the justification - I'm not sure. It's possible that there is no theoretical basis for it.
@balqeesmansour6692
@balqeesmansour6692 6 ай бұрын
Amazing video Dr.Josh but I am confused about scGPT , and I hope you will help me to know why we convert the gene expression matrix to sentences and then go to word embeddings and word2vec to convert it to numbers to work in deep learning models, I want simplification like yours . Thanks in advance.
@statquest
@statquest 6 ай бұрын
I'll keep that in mind.
@balqeesmansour6692
@balqeesmansour6692 6 ай бұрын
@@statquest you are great. I wanna know how to connect each part in scGPT in your amazing video thanks
Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!
16:50
StatQuest with Josh Starmer
Рет қаралды 158 М.
A Complete Overview of Word Embeddings
17:17
AssemblyAI
Рет қаралды 97 М.
1🥺🎉 #thankyou
00:29
はじめしゃちょー(hajime)
Рет қаралды 78 МЛН
$10,000 Every Day You Survive In The Wilderness
26:44
MrBeast
Рет қаралды 58 МЛН
КАКОЙ ВАШ ЛЮБИМЫЙ ЦВЕТ?😍 #game #shorts
00:17
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 233 М.
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 278 М.
How Google Translate Works - The Machine Learning Algorithm Explained!
15:03
Tensors for Neural Networks, Clearly Explained!!!
9:40
StatQuest with Josh Starmer
Рет қаралды 166 М.
Lecture 2 | Word Vector Representations: word2vec
1:18:17
Stanford University School of Engineering
Рет қаралды 502 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 202 М.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 576 М.
The Chain Rule
18:24
StatQuest with Josh Starmer
Рет қаралды 232 М.
1🥺🎉 #thankyou
00:29
はじめしゃちょー(hajime)
Рет қаралды 78 МЛН