Пікірлер
@yushuowang7820
@yushuowang7820 4 сағат бұрын
VAE: what about me?
@lialkalo4093
@lialkalo4093 11 сағат бұрын
very good explanation
@freerockneverdrop1236
@freerockneverdrop1236 18 сағат бұрын
This is very nice but I think there is one thing either wrong or not clear. At 11:40, he sais that in order to minimize the steps to generate the image, we generate multiple pixels in one step that are scattered around and independent to each other. This way we will not get the blurred image. This works in the latter steps when enough pixels have been generated and they guide the generation of the next batch of pixels. E.g., we know it is a car and we just try to add more details. But in the early steps, when we don't know what it is, if multiple pixels are generated, we fall back to average issue again, e.g. one pixel is for a car but another one is for a bridge. Therefore, I thinks in early steps, we can only generate one pixels each step. What do you think?
@algorithmicsimplicity
@algorithmicsimplicity 17 сағат бұрын
You are correct that at the early steps there is large amount of uncertainty over the value of each pixel. But what matters is how much this uncertainty is reduced by knowing the value of the previously generated pixels. Even at the first step, knowing the value of one pixel does not help that much in determining the value of far away pixels. Let's say one pixel in the center is blue, does that make you significantly more confident about the color of the top left pixel? Not really.
@Levy1111
@Levy1111 Күн бұрын
I do hope you'll soon get at least 6 figures subscribers count. The quality of your videos (both in terms of education and presentation) is top notch, people need you to become popular (at least within our small tech bubble).
@official_noself
@official_noself 2 күн бұрын
480p? 2023? Are you kidding me?
@LeYuzer
@LeYuzer 2 күн бұрын
Tips: turn on 1.5x
@pshirvalkar
@pshirvalkar 2 күн бұрын
A fantastic teacher!!! Thanks! Can you please cover Bayesian monte carlo markov chains? Would be very helpful!
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
Thanks for the suggestion, I will put it on my TODO list.
@alexanderterry187
@alexanderterry187 2 күн бұрын
How do the models deal with having different numbers of inputs? E.g. the text label provided can be any length, or not provided at all. I'm sure this is a basic question, but whenever I've used NNs previously they've always had a constant number of inputs or been reapplied to a sequence of data that has the same dimension at each step.
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
For image input, the input is always the same size (same image size and same channels), and the output is always the same size (1 pixel). For text, you can also treat the inputs as all being the same size by padding smaller inputs up to a fixed max length, though transformers can also operate on sequences of different lengths. The output for text is always the same size (a probability distribution over tokens in the vocabulary).
@karigucio
@karigucio 3 күн бұрын
so the transformation applied to the weights does not concern purely with initialization? instead, in the expression w=exp(-exp(a)*exp(ib)) numbers a and b are the learned parameters and not w, right?
@algorithmicsimplicity
@algorithmicsimplicity 3 күн бұрын
Yes a and b are the learned parameters.
@matthewfynn7635
@matthewfynn7635 3 күн бұрын
I have been working with machine learning models for years and this is the first time i have truly understood through visualisation the use of ReLU activation functions! Great video
@jaimeduncan6167
@jaimeduncan6167 4 күн бұрын
Out of nothing? no, it grabs people's work and creates a composite with variations.
@algorithmicsimplicity
@algorithmicsimplicity 3 күн бұрын
It isn't correct to say it creates a 'composite' with variations, models can generalize outside of their training dataset in certain ways, and generative models are capable of creating entirely new things that aren't present in the training dataset.
@MichaelBrown-gt4qi
@MichaelBrown-gt4qi 4 күн бұрын
I've started binge watching all your videos. 😁
@fergalhennessy775
@fergalhennessy775 4 күн бұрын
do u have a mewing routine bro love from north korea
@MichaelBrown-gt4qi
@MichaelBrown-gt4qi 4 күн бұрын
This is a great video. I have watched videos in the past (years ago) talk about auto-regression and more lately talk about diffusion. But it's nice to see why and how there was such a jump between the two. Amazing! However, I feel this video is a little incomplete when there was no mention of the enhancer model that "cleans up" the final generated image. This enhancing model is able to create a larger image while cleaning up the six fingers gen AI is so famous for. While not technically a part of the diffusion process (because it has no random noise) it is a valuable addition to image gen if anyone is trying to build their own model.
@capcadaverman
@capcadaverman 4 күн бұрын
Not made from nothing. Made by training on real people’s intellectual property. 😂
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
My image generator was trained on data licensed to be used for training machine learning models.
@capcadaverman
@capcadaverman 4 күн бұрын
@@algorithmicsimplicity not everyone is so ethical
@telotawa
@telotawa 4 күн бұрын
could diffusion work on text generation?
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
Yes, it absolutely can! Instead of adding normally distributed noise, you randomly mask tokens with some probability, see e.g. arxiv.org/abs/2406.04329 . That said, it tends to produce a bit worse quality text than auto-regression (actually this is true for images as well, it's just on images auto-regression takes too long to be viable.)
@LordDoucheBags
@LordDoucheBags 4 күн бұрын
What did you mean by causal architectures? Because when I search online I get stuff about causal inference, so I’m guessing there’s a different and more popular term for what you’re referring to?
@julioalmeida4645
@julioalmeida4645 4 күн бұрын
Damn. Amazing piece
@dmitrii.zyrianov
@dmitrii.zyrianov 5 күн бұрын
Hey! Thanks for the video, it is very informative! I have a question. At 18:17 you say that an average of a bunch of noise is still a valid noise. I'm not sure why it is true here. I'd expect the average of a bunch of noise to be just 0.5 value (if we map rgb values to 0..1 range)
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
Right, the average is just the center of the noise distribution which, let's say the color values are mapped from -1 to 1, is 0. This average doesn't look like noise (it is just a solid grey image), but if you ask what is the probability of this image under the noise distribution, it actually has the highest probability. The noise distribution is a normal distribution centered at 0, so the input which is all 0 has the highest probability. So the average image still lies within the noise distribution, as opposed to natural images where the average moves outside the data distribution
@dmitrii.zyrianov
@dmitrii.zyrianov 4 күн бұрын
Thank you for the reply, I think I got it now
@simonpenelle2574
@simonpenelle2574 5 күн бұрын
Amazing content I now want to implement this
@dadogwitdabignose
@dadogwitdabignose 5 күн бұрын
great video man suggestion: can you create a video on how generative transformers work? this has been really bothering me and hearing an in-depth explanation of them, like your video here, would be helpful!
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
Generative transformers work in exactly the same way as generative CNNs. It doesn't matter what backbone you use the idea is the same, you will use auto-regression or diffusion to train a transformer to undo the masking/noising process.
@dadogwitdabignose
@dadogwitdabignose 5 күн бұрын
@@algorithmicsimplicity which is more efficient to use and how do they handle text data to map text into tensors?
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
@@dadogwitdabignose I explain how transformer classifiers work in this video: kzfaq.info/get/bejne/ob18mMdp1JuxYo0.html . As for which is more efficient, it depends on the data. Usually for text data transformers will be more efficient (for reasons I explain in that video), and for images CNNs will be more efficient.
@Kavukamari
@Kavukamari 5 күн бұрын
"i can do eleventy kajillion computations every second" "okay, what's your memory throughput"
@deep.space.12
@deep.space.12 6 күн бұрын
If there will be a longer version of this video, it might be worth mentioning VAE as well.
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
Thanks for the suggestion.
@wormjuice7772
@wormjuice7772 6 күн бұрын
This has helped me so much wrapping my head around this whole subject! Thank you for now, and the future!
@codybarton2090
@codybarton2090 7 күн бұрын
Crazy video
@gameboyplayer217
@gameboyplayer217 7 күн бұрын
Nicely explained
@snippletrap
@snippletrap 8 күн бұрын
Fantastic explanation. Very intuitive
@ibrahimaba8966
@ibrahimaba8966 8 күн бұрын
Thank you for this beautiful work!
@algorithmicsimplicity
@algorithmicsimplicity 7 күн бұрын
Thank you very much!
@boogati9221
@boogati9221 8 күн бұрын
Crazy how two separate ideas ended up converging into one nearly identical solution.
@mattshannon5111
@mattshannon5111 8 күн бұрын
Wow, it requires really deep understanding and a lot of work to make videos this clear that are also so correct and insightful. Very impressive!
@vibaj16
@vibaj16 8 күн бұрын
wait, can this be used as a ray tracing denoiser? That is, you'd plug your noisy ray traced image into one of the later steps of the diffusion model, so the model tries to make it clear?
@algorithmicsimplicity
@algorithmicsimplicity 8 күн бұрын
Yep you could definitely do that, you would probably need to train a model on some examples of noisy ray traced images though.
@Maxawa0851
@Maxawa0851 8 күн бұрын
Yeah but this is very slow do
@antongromek4180
@antongromek4180 9 күн бұрын
Actually, there is no LLM, etc - but 500 million nerds - sitting in basements all over the world.
@artkuts4792
@artkuts4792 9 күн бұрын
I still didn't get how the scoring model works. So before you were labeling the important pairs by hand giving it a score based on the semantic value each pair has for a given context, but then it's done automatically by a CNN, how does it define the score though (and it's context free, isn't it)?
@algorithmicsimplicity
@algorithmicsimplicity 9 күн бұрын
The entire model is trained end-to-end to minimize the training loss. To start off with, the scoring functions are completely random, but during training they will change to output scores which are useful, i.e. which cause the model's final prediction to better match the training labels. In practice it turns out that what these scoring functions learn while trying to be useful is very similar to the 'semantic scoring' that a human would do.
@lusayonyondo9111
@lusayonyondo9111 9 күн бұрын
wow, this is such an amazing resource. I'm glad I stuck around. This is literally the first time this is all making sense to me.
@istoleyourfridgecall911
@istoleyourfridgecall911 10 күн бұрын
Hands down the best video that explains how these models work. I love that you explain these topics in a way that resembles how the researchers created these models. Your video shows the thinking process behind these models, combined with great animated examples, it is so easy to understand. You really went all out. Only if youtube promoted these kinds of videos instead of brainrot low quality videos made by inexperienced teenagers.
@PotatoMan1491
@PotatoMan1491 10 күн бұрын
Best video I found for explaining this topic
@TheTwober
@TheTwober 10 күн бұрын
The best explanation I have found on the internet so far. 👍
@yoloswaggins2161
@yoloswaggins2161 10 күн бұрын
A guy who actually understands this stuff
@poipoi300
@poipoi300 10 күн бұрын
This is refreshing to watch in a sea of people who don't know what they're talking about and decide to make "educational" videos on the subject anyways. The simplifications are often harmful.
@itsyaro1297
@itsyaro1297 10 күн бұрын
Hey man! Really appreciate the technical detail in your videos <3 Could you please cover MoEs next, I feel as though these will be more prominent than MAMBA in the near future! Cheers
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
Thanks for the suggestion, I will add them to the TODO list.
@riddhimanmoulick3407
@riddhimanmoulick3407 11 күн бұрын
Kudos for an incredibly intuitive explanation! Really loved the visual representations too!!
@vasil_astrov
@vasil_astrov 11 күн бұрын
Thank you! This is great explanation❤
@nias2631
@nias2631 11 күн бұрын
@nias2631 I have no particular opinion on transformers or MAMBA since, for my work, I never use these. But as for peer review I think that Open Review itself is a great "filter for the filter". The research community can actively review the reasoning for accept/reject as you did in this video. For most journals not using Open Review the process is fairly opaque.
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
Absolutely agree, the transparent review process is definitely a net benefit for the community as a whole.
@codybarton2090
@codybarton2090 11 күн бұрын
So keep some aspects of privacy laws coherent but merge the different sides of the web in a quantum computer
@sebbbi2
@sebbbi2 11 күн бұрын
Thanks
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
Thank you so much!
@neonelll
@neonelll 11 күн бұрын
The best explanation I've seen. Great work.
@photamasan9661
@photamasan9661 11 күн бұрын
You’re him 🙌🏽. Thank you so much. Getting this kind of information or well explanation is not easy with all the “BREAKING AI NEWS !😮‼️” on KZfaq now.
@tomashonzik1758
@tomashonzik1758 11 күн бұрын
Thanks!
@algorithmicsimplicity
@algorithmicsimplicity 11 күн бұрын
Thanks for your support!
@zlatanonkovic2424
@zlatanonkovic2424 11 күн бұрын
What a great explanation!
@L0615T1C
@L0615T1C 11 күн бұрын
boosting the algoooooo