This is very nice but I think there is one thing either wrong or not clear. At 11:40, he sais that in order to minimize the steps to generate the image, we generate multiple pixels in one step that are scattered around and independent to each other. This way we will not get the blurred image. This works in the latter steps when enough pixels have been generated and they guide the generation of the next batch of pixels. E.g., we know it is a car and we just try to add more details. But in the early steps, when we don't know what it is, if multiple pixels are generated, we fall back to average issue again, e.g. one pixel is for a car but another one is for a bridge. Therefore, I thinks in early steps, we can only generate one pixels each step. What do you think?
@algorithmicsimplicity17 сағат бұрын
You are correct that at the early steps there is large amount of uncertainty over the value of each pixel. But what matters is how much this uncertainty is reduced by knowing the value of the previously generated pixels. Even at the first step, knowing the value of one pixel does not help that much in determining the value of far away pixels. Let's say one pixel in the center is blue, does that make you significantly more confident about the color of the top left pixel? Not really.
@Levy1111Күн бұрын
I do hope you'll soon get at least 6 figures subscribers count. The quality of your videos (both in terms of education and presentation) is top notch, people need you to become popular (at least within our small tech bubble).
@official_noself2 күн бұрын
480p? 2023? Are you kidding me?
@LeYuzer2 күн бұрын
Tips: turn on 1.5x
@pshirvalkar2 күн бұрын
A fantastic teacher!!! Thanks! Can you please cover Bayesian monte carlo markov chains? Would be very helpful!
@algorithmicsimplicity2 күн бұрын
Thanks for the suggestion, I will put it on my TODO list.
@alexanderterry1872 күн бұрын
How do the models deal with having different numbers of inputs? E.g. the text label provided can be any length, or not provided at all. I'm sure this is a basic question, but whenever I've used NNs previously they've always had a constant number of inputs or been reapplied to a sequence of data that has the same dimension at each step.
@algorithmicsimplicity2 күн бұрын
For image input, the input is always the same size (same image size and same channels), and the output is always the same size (1 pixel). For text, you can also treat the inputs as all being the same size by padding smaller inputs up to a fixed max length, though transformers can also operate on sequences of different lengths. The output for text is always the same size (a probability distribution over tokens in the vocabulary).
@karigucio3 күн бұрын
so the transformation applied to the weights does not concern purely with initialization? instead, in the expression w=exp(-exp(a)*exp(ib)) numbers a and b are the learned parameters and not w, right?
@algorithmicsimplicity3 күн бұрын
Yes a and b are the learned parameters.
@matthewfynn76353 күн бұрын
I have been working with machine learning models for years and this is the first time i have truly understood through visualisation the use of ReLU activation functions! Great video
@jaimeduncan61674 күн бұрын
Out of nothing? no, it grabs people's work and creates a composite with variations.
@algorithmicsimplicity3 күн бұрын
It isn't correct to say it creates a 'composite' with variations, models can generalize outside of their training dataset in certain ways, and generative models are capable of creating entirely new things that aren't present in the training dataset.
@MichaelBrown-gt4qi4 күн бұрын
I've started binge watching all your videos. 😁
@fergalhennessy7754 күн бұрын
do u have a mewing routine bro love from north korea
@MichaelBrown-gt4qi4 күн бұрын
This is a great video. I have watched videos in the past (years ago) talk about auto-regression and more lately talk about diffusion. But it's nice to see why and how there was such a jump between the two. Amazing! However, I feel this video is a little incomplete when there was no mention of the enhancer model that "cleans up" the final generated image. This enhancing model is able to create a larger image while cleaning up the six fingers gen AI is so famous for. While not technically a part of the diffusion process (because it has no random noise) it is a valuable addition to image gen if anyone is trying to build their own model.
@capcadaverman4 күн бұрын
Not made from nothing. Made by training on real people’s intellectual property. 😂
@algorithmicsimplicity4 күн бұрын
My image generator was trained on data licensed to be used for training machine learning models.
@capcadaverman4 күн бұрын
@@algorithmicsimplicity not everyone is so ethical
@telotawa4 күн бұрын
could diffusion work on text generation?
@algorithmicsimplicity4 күн бұрын
Yes, it absolutely can! Instead of adding normally distributed noise, you randomly mask tokens with some probability, see e.g. arxiv.org/abs/2406.04329 . That said, it tends to produce a bit worse quality text than auto-regression (actually this is true for images as well, it's just on images auto-regression takes too long to be viable.)
@LordDoucheBags4 күн бұрын
What did you mean by causal architectures? Because when I search online I get stuff about causal inference, so I’m guessing there’s a different and more popular term for what you’re referring to?
@julioalmeida46454 күн бұрын
Damn. Amazing piece
@dmitrii.zyrianov5 күн бұрын
Hey! Thanks for the video, it is very informative! I have a question. At 18:17 you say that an average of a bunch of noise is still a valid noise. I'm not sure why it is true here. I'd expect the average of a bunch of noise to be just 0.5 value (if we map rgb values to 0..1 range)
@algorithmicsimplicity4 күн бұрын
Right, the average is just the center of the noise distribution which, let's say the color values are mapped from -1 to 1, is 0. This average doesn't look like noise (it is just a solid grey image), but if you ask what is the probability of this image under the noise distribution, it actually has the highest probability. The noise distribution is a normal distribution centered at 0, so the input which is all 0 has the highest probability. So the average image still lies within the noise distribution, as opposed to natural images where the average moves outside the data distribution
@dmitrii.zyrianov4 күн бұрын
Thank you for the reply, I think I got it now
@simonpenelle25745 күн бұрын
Amazing content I now want to implement this
@dadogwitdabignose5 күн бұрын
great video man suggestion: can you create a video on how generative transformers work? this has been really bothering me and hearing an in-depth explanation of them, like your video here, would be helpful!
@algorithmicsimplicity5 күн бұрын
Generative transformers work in exactly the same way as generative CNNs. It doesn't matter what backbone you use the idea is the same, you will use auto-regression or diffusion to train a transformer to undo the masking/noising process.
@dadogwitdabignose5 күн бұрын
@@algorithmicsimplicity which is more efficient to use and how do they handle text data to map text into tensors?
@algorithmicsimplicity5 күн бұрын
@@dadogwitdabignose I explain how transformer classifiers work in this video: kzfaq.info/get/bejne/ob18mMdp1JuxYo0.html . As for which is more efficient, it depends on the data. Usually for text data transformers will be more efficient (for reasons I explain in that video), and for images CNNs will be more efficient.
@Kavukamari5 күн бұрын
"i can do eleventy kajillion computations every second" "okay, what's your memory throughput"
@deep.space.126 күн бұрын
If there will be a longer version of this video, it might be worth mentioning VAE as well.
@algorithmicsimplicity5 күн бұрын
Thanks for the suggestion.
@wormjuice77726 күн бұрын
This has helped me so much wrapping my head around this whole subject! Thank you for now, and the future!
@codybarton20907 күн бұрын
Crazy video
@gameboyplayer2177 күн бұрын
Nicely explained
@snippletrap8 күн бұрын
Fantastic explanation. Very intuitive
@ibrahimaba89668 күн бұрын
Thank you for this beautiful work!
@algorithmicsimplicity7 күн бұрын
Thank you very much!
@boogati92218 күн бұрын
Crazy how two separate ideas ended up converging into one nearly identical solution.
@mattshannon51118 күн бұрын
Wow, it requires really deep understanding and a lot of work to make videos this clear that are also so correct and insightful. Very impressive!
@vibaj168 күн бұрын
wait, can this be used as a ray tracing denoiser? That is, you'd plug your noisy ray traced image into one of the later steps of the diffusion model, so the model tries to make it clear?
@algorithmicsimplicity8 күн бұрын
Yep you could definitely do that, you would probably need to train a model on some examples of noisy ray traced images though.
@Maxawa08518 күн бұрын
Yeah but this is very slow do
@antongromek41809 күн бұрын
Actually, there is no LLM, etc - but 500 million nerds - sitting in basements all over the world.
@artkuts47929 күн бұрын
I still didn't get how the scoring model works. So before you were labeling the important pairs by hand giving it a score based on the semantic value each pair has for a given context, but then it's done automatically by a CNN, how does it define the score though (and it's context free, isn't it)?
@algorithmicsimplicity9 күн бұрын
The entire model is trained end-to-end to minimize the training loss. To start off with, the scoring functions are completely random, but during training they will change to output scores which are useful, i.e. which cause the model's final prediction to better match the training labels. In practice it turns out that what these scoring functions learn while trying to be useful is very similar to the 'semantic scoring' that a human would do.
@lusayonyondo91119 күн бұрын
wow, this is such an amazing resource. I'm glad I stuck around. This is literally the first time this is all making sense to me.
@istoleyourfridgecall91110 күн бұрын
Hands down the best video that explains how these models work. I love that you explain these topics in a way that resembles how the researchers created these models. Your video shows the thinking process behind these models, combined with great animated examples, it is so easy to understand. You really went all out. Only if youtube promoted these kinds of videos instead of brainrot low quality videos made by inexperienced teenagers.
@PotatoMan149110 күн бұрын
Best video I found for explaining this topic
@TheTwober10 күн бұрын
The best explanation I have found on the internet so far. 👍
@yoloswaggins216110 күн бұрын
A guy who actually understands this stuff
@poipoi30010 күн бұрын
This is refreshing to watch in a sea of people who don't know what they're talking about and decide to make "educational" videos on the subject anyways. The simplifications are often harmful.
@itsyaro129710 күн бұрын
Hey man! Really appreciate the technical detail in your videos <3 Could you please cover MoEs next, I feel as though these will be more prominent than MAMBA in the near future! Cheers
@algorithmicsimplicity10 күн бұрын
Thanks for the suggestion, I will add them to the TODO list.
@riddhimanmoulick340711 күн бұрын
Kudos for an incredibly intuitive explanation! Really loved the visual representations too!!
@vasil_astrov11 күн бұрын
Thank you! This is great explanation❤
@nias263111 күн бұрын
@nias2631 I have no particular opinion on transformers or MAMBA since, for my work, I never use these. But as for peer review I think that Open Review itself is a great "filter for the filter". The research community can actively review the reasoning for accept/reject as you did in this video. For most journals not using Open Review the process is fairly opaque.
@algorithmicsimplicity10 күн бұрын
Absolutely agree, the transparent review process is definitely a net benefit for the community as a whole.
@codybarton209011 күн бұрын
So keep some aspects of privacy laws coherent but merge the different sides of the web in a quantum computer
@sebbbi211 күн бұрын
Thanks
@algorithmicsimplicity10 күн бұрын
Thank you so much!
@neonelll11 күн бұрын
The best explanation I've seen. Great work.
@photamasan966111 күн бұрын
You’re him 🙌🏽. Thank you so much. Getting this kind of information or well explanation is not easy with all the “BREAKING AI NEWS !😮‼️” on KZfaq now.