Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes

  Рет қаралды 6,524

Yuxiang "Shawn" Wang

Yuxiang "Shawn" Wang

Ай бұрын

Thank you for checking out my video notes on the orignal transformer paper "Attention is all you need", as introduced by a layman - me! I would love to share my ML learning journey with you.
Paper information:
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
Please let me know in the comment section regarding any questions, points of discussion, or anything you would like see next. See you in the next video!

Пікірлер: 20
@oo_wais
@oo_wais Ай бұрын
one of the very few videos i found on youtube that explains the architecture very well
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Thank you so much for the recognition!
@tk-og4yk
@tk-og4yk Ай бұрын
Another Video! Looking forward to watching.
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Haha thank you for your support! It was an old deck I made a year ago, so I might as well record it :)
@matthewritter1117
@matthewritter1117 Ай бұрын
Incredible content and your style is a perfect mix of confident and relatable. Keep it up!
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
I appreciate the encouragement :)
@OEDzn
@OEDzn Ай бұрын
amazing video!
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Thank you!
@420_gunna
@420_gunna Ай бұрын
Seems like a great video, subbed! 🙂
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Thanks for the sub! Appreciate the recognition ❤️
@s8x.
@s8x. Ай бұрын
please do more videos like this
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Thank you! Will do :)
@aga5979
@aga5979 8 күн бұрын
Thank you for the very valuable explanation. But in what f ucking world do laymen speak with dot product , cosine and e to the power of time and time prime? 😅😅😂😂.
@isiisorisiaint
@isiisorisiaint Ай бұрын
pretty okay until andrew's attention slide, then when it comes to your own explanations things become murky, and when you get "explain" the decoder, and then the full codec, you're swiping everything under the rug in a few short seconds when in fact this is exactly the section you should have spent most of time. all in all, a nice video until adrew's slide, basically worthless afterwards
@yuxiangwang9624
@yuxiangwang9624 Ай бұрын
Thanks for the feedback! Will learn to improve :) Would you mind explain in more details on which part I was missing for the encoder details? I can look into those and see if I can add some later!
@isiisorisiaint
@isiisorisiaint 29 күн бұрын
@@yuxiangwang9624 darn, i got a notification that you responded to my comment, but only the first line of your reply was shown ("Thanks for the feedback! Will learn to improve :)"), and i didn't actually open to see your full reply until now. I will be back to you with the details, sorry for the delay...
@nxlamik1245
@nxlamik1245 6 күн бұрын
Work on explainging things easily. It seems u have enough knowledw but you made it difficult
@yuxiangwang9624
@yuxiangwang9624 3 күн бұрын
Thank you for the feedback! I'd love to work on it. Could you kindly share an example? I'll take a shot in my next video.
@MDNQ-ud1ty
@MDNQ-ud1ty Күн бұрын
@@yuxiangwang9624 Likely he is a beginner and your explanations are for someone that already has some idea how these architectures work. There will always be issues trying to match "impedance" between the teacher and the student because one must match precisely what needs to be understood with what is understood with how to understand with with how to explain it and this is context(student/teacher dependent). You are just giving a basic overview of the process and generally beginners need to have their hand held and given many examples with many ideas(since they are blind). Examples and such are the best way for beginners to learn since the words used to explain things are meaningless(they do not understand or know them yet and so do not resonate). Basically you teach a child by showing them rather than explaining to them. If you try to explain to them how something works they will not understand it like you think they will. So ultimately it depends on your goal. Ideally there would be some way for youtube to have a setting in which the student and find videos that match exactly what he needs to learn optimally but that doesn't happen under capitalism(capitalism profits off inefficiencies and having an impedance mismatch is an inefficiency). So you have to accept those inefficiencies(as someone trying to teach you have to understand such things exist as the student likely won't) and realize some people(maybe many) will not understand your explanations for a multitude of reasons while a few will(because they have the right "impedance"). In general though, as someone trying to explain something to someone else all I can say is "Know your audience"("Knowing your audience is all you need"). This means that if you are targeting someone who is blind you must go through every little detail and treat them as a kid. If you are targeting someone that knows X then you can assume X and focus on Y. The more you assume the less reach you have. IMO, your explanation won't do much good for someone who hasn't ever done any actual NN training and used the basic models. E.g., knowledge is also progressive so if you want to understand calculus well you need to know algebra and to understand algebra you need to know arithmetic. Also understanding topology needs calculus but also helps one understand calculus. One of the problems with education is that it's not really streamlined and so it is very inefficient. Everyone is at different levels with very different abilities with very different lives so it can be hard for things to "match up". The good news is that anyone can create videos to try to teach others... the bad news is that the student has to sift through it all(wasting time = inefficiency) to find what works. But you won't fix this problem so you just have to do your best. I guess the "best way" currently to deal with this is to "state your assumptions" at the start. E.g., "I assume you understand neural networks and have done some basic work in them such as training RNN's and are comfortable with linear algebra jargon. The more foreign these things are to you the more you will struggle to grasp what I'm talking about". But also it is sometimes ok for someone to listen to others even when it is above their head as familiarity brings awareness(which is all learning is is becoming aware).
@MrMusk-it5nz
@MrMusk-it5nz Ай бұрын
You aren't definitely a layman
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 77 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 215 М.
Когда на улице Маябрь 😈 #марьяна #шортс
00:17
100❤️ #shorts #construction #mizumayuuki
00:18
MY💝No War🤝
Рет қаралды 10 МЛН
1 класс vs 11 класс (неаккуратность)
01:00
NO NO NO YES! (50 MLN SUBSCRIBERS CHALLENGE!) #shorts
00:26
PANDA BOI
Рет қаралды 100 МЛН
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 98 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 106 М.
Gail Weiss: Thinking Like Transformers
1:07:12
Formal Languages and Neural Networks Seminar
Рет қаралды 11 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 199 М.
Which Phone Unlock Code Will You Choose? 🤔️
0:14
Game9bit
Рет қаралды 11 МЛН
ПРОБЛЕМА МЕХАНИЧЕСКИХ КЛАВИАТУР!🤬
0:59
Корнеич
Рет қаралды 2,9 МЛН
What’s your charging level??
0:14
Татьяна Дука
Рет қаралды 6 МЛН
A Comprehensive Guide to Using Zoyya Tools for Photo Editing
0:50
Fiber kablo
0:15
Elektrik-Elektronik
Рет қаралды 7 МЛН
Apple watch hidden camera
0:34
_vector_
Рет қаралды 47 МЛН