Why Does Diffusion Work Better than Auto-Regression?

  Рет қаралды 107,204

Algorithmic Simplicity

Algorithmic Simplicity

Күн бұрын

Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!
In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.
Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.
The following generative models were featured as demos in this video:
Images: Adobe Firefly (www.adobe.com/products/firefl...)
Text: ChatGPT (chat.openai.com)
Audio: Suno.ai (suno.ai)
Code: Gemini (gemini.google.com/app)
Video: Lumiere (Lumiere-video.github.io)
Chapters:
00:00 Intro to Generative AI
02:40 Why Naïve Generation Doesn't Work
03:52 Auto-regression
08:32 Generalized Auto-regression
11:43 Denoising Diffusion
14:19 Optimizations
14:30 Re-using Models and Causal Architectures
16:35 Diffusion Models Predict the Noise Instead of the Image
18:19 Conditional Generation
19:08 Classifier-free Guidance

Пікірлер: 154
@doku7335
@doku7335 4 күн бұрын
At first I thought "oh, another random video explaining the same basics and not adding anything new", but I was so wrong. It's an incredibly clear explanation of diffusion, and the start with the basic makes the full picture much clearer. Thank you for the video!
@jupiterbjy
@jupiterbjy 6 күн бұрын
kinda sorry to my professors and seniors but this is the single best explanation of logics behind each models. About dozen min vid > 2 years of confusion in univ
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Next video will be on Mamba/SSM/Linear RNNs!
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
great! Also maybe think about the Tradeoff between scaling and incremental improvements, in case your perspective is, that LLM´s also always approximate the data set and therefore memorize rather than any "emergent capabilities". So that ChatGPT also does "only" curve fitting.
@harshvardhanv3873
@harshvardhanv3873 10 күн бұрын
I am student who is pursuing a degree in ai and we want more of your videos for even simplest of the concepts in ai, trust me this channel will be a huge deal in the near future, good luck!!
@QuantenMagier
@QuantenMagier 6 сағат бұрын
Well take my subscription then!!1111
@user-my3dd4lu2k
@user-my3dd4lu2k Ай бұрын
Man I love the fact that you present the fundamental idea with an Intuitionistic approach, and then discuss the optimization.
@pseudolimao
@pseudolimao 4 күн бұрын
this is insane. I feel bad for getting this level of content for free
@user-fh7tg3gf5p
@user-fh7tg3gf5p 3 ай бұрын
This genius only makes videos occassionally, that are not to be missed.
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
absolutely true
@pw7225
@pw7225 5 күн бұрын
The way you tell the story is fantastic! I am surprised that all AI/ML books are so terrible at didactics. We should always start at the intuition, the big picture, the motivation. The math comes later when the intuition is clear.
@yqisq6966
@yqisq6966 11 күн бұрын
The clearest and most concise explanation of diffusion model I've seen so far. Well done.
@Veptis
@Veptis 3 күн бұрын
This is a great explanation on how image decoders work. I haven't seen this approach and narrative direction yet. This now makes my reference for explaining it to people that got no idea.!
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 15 күн бұрын
This is a much better explanation than the diffusion paper itself. They just went all around variational inference to get the same result!
@rafa_br34
@rafa_br34 13 күн бұрын
Such an underrated video, I love how you went from the basic concepts to complex ones and didn't just explain how it works but also the reason why other methods are not as good/efficient. I will definitely be looking forward to more of your content!
@Jack-gl2xw
@Jack-gl2xw 9 күн бұрын
I have trained my own diffusion models and it required me to do a deep dive of the literature. This is hands down the best video on the subject and covers so much helpful context that makes understanding diffusion models so much easier. I applaud your hard work, you have earned a subscriber!
@RicardoRamirez-dr6gc
@RicardoRamirez-dr6gc 10 күн бұрын
This is seriously one of the best explainer videos i've ever seen. I've spent a long time trying to understand diffusion models and not a single video has come close to this one
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
Very good job. My suggestion is that you explain more about how it actually works, that the model learns to understand complete sceneries just from text prompts. This could fill its own video. Also it would be very nice to have a video about Diffusion Transformers like OpenAIs Sora probably is. Also it could be great to have a Video about the paper "Learning in High Dimension Always Amounts to Extrapolation". best wishes
@algorithmicsimplicity
@algorithmicsimplicity 2 ай бұрын
Thanks for the suggestions, I was planning to make a video about why neural networks generalize outside their training set from the perspective of algorithmic complexity. That paper "Learning in High Dimension Always Amounts to Extrapolation" essentially argues that the interpolation vs extrapolation distinction is meaningless for high dimensional data, and I agree, I don't think it is worth talking about interpolation/extrapolation at all when explaining neural network generalization.
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
@@algorithmicsimplicity yes true. It would be great also because this links back to the LLM´s discussions, wether scaling up Transformers actually brings up "emergent capabilities", or if this is simple and less magical explainable by extrapolation. Or in other words: either people tend to believe, that Deep Learning Architectures like Transformers only approximating their training data set, or people tend to believe, that seemingly unexplainable or unexpected capabilities emerge while scaling. I believe, that extrapolation alone explains really good why LLM´s work so well, especially when scaled up AND that LLM´s "just" approximate their training data (curve fitting). This is why i brought this up ;)
@shivamkaushik6637
@shivamkaushik6637 6 минут бұрын
Never knew youtube could give random suggestion to videos like these. This was mind blowing. The way you teach is work of art.
@Frdyan
@Frdyan 2 күн бұрын
I have a graduate degree in this shit and this is by far the clearest explanation of diffusion I've seen. Have you thought about doing a video running over the NN Zoo? I've used that as a starting point for lectures on NN and people seem to really connect with that paradigm
@HD-Grand-Scheme-Unfolds
@HD-Grand-Scheme-Unfolds 15 күн бұрын
You truly understand how to simplify... to engage our imagination... to employ naive thought or ideas to make comparisons to bring across a deeper more core principles and concepts to make the subject for more easier to grasp and get an intuition for. Algorithmic Simplicity indeed... thank you for your style of presentation and teaching. love it love it... you make me know what question I want to ask but didn't know I wanted to ask. KZfaq needs your contribution in ML education. please don't forget that.
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
A person with very less background can understand what he describes here.. commenting to make youtube so it gets recommended for other .. wonderful video! really good one
@karlnikolasalcala8208
@karlnikolasalcala8208 8 күн бұрын
This channel is gold, I'm glad I've randomly stumbled across one of your vids
@CodeMonkeyNo42
@CodeMonkeyNo42 7 күн бұрын
Great video. Love the pacing and how you distiled the material into such an easy to watch video. Great job!
@MeriaDuck
@MeriaDuck Күн бұрын
This must be one of the best and concise explanations I've seen!
@jcorey333
@jcorey333 3 ай бұрын
This is an amazing quality video! The best conceptual video on diffusion in AI I've ever seen. Thanks for making it! I'd love to see you cover RNNs.
@Matyanson
@Matyanson 6 күн бұрын
Thank you for the explanation. I already knew a little bit about diffusion but this is exactly the way I'd hope to learn. Start from the simplest examples(usually historical) and progresivelly advance, explaining each optimisation!
@anthonybernstein1626
@anthonybernstein1626 20 күн бұрын
I had a good idea how diffusion models work but I still learned a lot from this video. Thanks!
@banana_lemon_melon
@banana_lemon_melon 7 күн бұрын
bruh, I loved your contents. Other channel/video usually explain general knowledge that can be easily found on internet. But you're going deeper to the intrinsic aspects of how the stuff works. This video, and one of your video about transformer, are really good.
@mrdr9534
@mrdr9534 5 күн бұрын
Thanks for taking the time and effort of making and sharing these videos and Your knowledge. Kudos and best regards
@JordanMetroidManiac
@JordanMetroidManiac 6 күн бұрын
I finally understand how models like Stable Diffusion work now! I tried understanding them before but got lost at the equation (17:50), but this video describes that equation very simply. Thank you!
@ecla141
@ecla141 2 күн бұрын
Awesome video! I would love to see a video about graph neural networks
@xaidopoulianou6577
@xaidopoulianou6577 10 күн бұрын
Very nicely and simply explained! Keep it up
@iestynne
@iestynne 5 күн бұрын
Wow, fantastic video. Such clear explanations. I learned a great deal from this. Thank you so much!
@abdelhakkhalil7684
@abdelhakkhalil7684 7 күн бұрын
This was a good watch, thank you :)
@tkimaginestudio
@tkimaginestudio Күн бұрын
Great explanations, thank you!
@1.4142
@1.4142 3 ай бұрын
Some2 really brought out some good channels
@RobotProctor
@RobotProctor 10 күн бұрын
I like to think of ML as a funky calculator. Instead of a calculator where you give it inputs and an operation and it gives you an output, you give it inputs and outputs and it gives you an operation. You said it's like curve fitting, which is the same thing, but I like thinking the words funky calculator because why not
@user-yj3mf1dk7b
@user-yj3mf1dk7b 8 күн бұрын
nice explanations, although, i've already knew about diffusion. examples from simplest to final diffusion -- were a really nice touch.
@sanjeev.rao3791
@sanjeev.rao3791 2 күн бұрын
Wow, that was a fantastic explanation.
@iancallegariaragao
@iancallegariaragao 3 ай бұрын
Great video and amazing content quality!
@akashmody9954
@akashmody9954 3 ай бұрын
Great video....already waiting for your next video
@ShubhamSinghYoutube
@ShubhamSinghYoutube Күн бұрын
Love the conclusion
@anatolyr3589
@anatolyr3589 Ай бұрын
Great explanation!👍👍, I personally would like to see a video observing all major types of neural nets with their distinctions, specifics, advantages, disadvantages etc. the author explains very well 👏👏
@user-er9pw4qh6j
@user-er9pw4qh6j 18 күн бұрын
Soooo Good!!! Thanks for making it!!!!
7 күн бұрын
I think it would help to mention that the auto-regressors may be viewing the image as a sequence of pixels (RGB vectors). Overall excellent video, extremely intuitive.
@algorithmicsimplicity
@algorithmicsimplicity 7 күн бұрын
In general, auto-regressors do not view images as a sequence. For example, PixelCNN uses convolutional layers and treats inputs as 2d images. Only sequential models such as recurrent neural networks would view the image as a sequence.
7 күн бұрын
@@algorithmicsimplicity of course, but I feel mentioning it may help with intuition as you’re walking through pixel by pixel image generation
@Mhrn.Bzrafkn
@Mhrn.Bzrafkn 11 күн бұрын
It was too easy understanding👌🏻👌🏻
@paaabl0.
@paaabl0. 6 күн бұрын
Great video! Focus on the right elements.
@vijayaveluss9098
@vijayaveluss9098 9 күн бұрын
Great explanation
@RobotProctor
@RobotProctor 10 күн бұрын
Thank you. This video is wonderful
@zephilde
@zephilde 12 күн бұрын
Great visualisation! Good job! Maybe next video on LoRA or ControlNet ?
@algorithmicsimplicity
@algorithmicsimplicity 12 күн бұрын
Great suggestions, I will put them on my TODO list.
@banana_lemon_melon
@banana_lemon_melon 7 күн бұрын
+1 for LoRA
@marcinstrzesak346
@marcinstrzesak346 15 күн бұрын
Very good video. Thank you
@khangvutien2538
@khangvutien2538 9 күн бұрын
Thank you very much. I enjoyed the first part, the first 10 seconds. After, there are too any shortcuts in the explanations that I struugled to understand and be able to explain it again to myself. Still, I subscribed. As for suggestions for other videos, I'll check whether you have explained the U-Net already. If not I'd appreciate to have the same kind of explanation about it.
@psl_schaefer
@psl_schaefer Күн бұрын
Amazing video!
@joaosousapinto3614
@joaosousapinto3614 14 күн бұрын
Great video, congrats.
@ollie-d
@ollie-d 2 күн бұрын
Solid video!
@mojtabavalipour
@mojtabavalipour 7 күн бұрын
Well done!
@demohub
@demohub 8 күн бұрын
Just subscribed. Great video
@meanderthalensis
@meanderthalensis 9 күн бұрын
Great video!
@AurL_69
@AurL_69 10 күн бұрын
thanks for explaining
@hmmmza
@hmmmza 3 ай бұрын
what a great rare content!
@johnbolt2686
@johnbolt2686 6 күн бұрын
I would recommend reading about active inference to possibly understand the role of generative models in intelligence.
@pon1
@pon1 10 күн бұрын
Still feels like magic to me 🙌🙌
@ArtOfTheProblem
@ArtOfTheProblem 13 күн бұрын
great work
@oculuscat
@oculuscat 10 күн бұрын
Diffusion doesn't necessarily work better than auto-regression. The "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" paper introduces an architecture they call VAR that upscales noise using an AR model and this currently out-performs all diffusion models in terms of speed and accuracy.
@winstongraves8321
@winstongraves8321 9 күн бұрын
Great video
@ChristProg
@ChristProg 15 күн бұрын
Thank you So much Sir. Really interesting video. But i will like you to create a video on how the generative model uses the text promt during training. Thank you Sir. I subscribed !😊
@mallow610
@mallow610 9 күн бұрын
Video is a banger
@infographie
@infographie 8 күн бұрын
Excellent.
@aydr5412
@aydr5412 6 күн бұрын
Thank you for the video. Imao curve fitting is oversimplification, it destructs us from real problem - what and how being optimized. Also there is different perspective on cases there we prefer computational efficiency over training quality: with efficiency you can train model on more data and for longer time using same amount of computational resources which actually results in better model
@johnmorrell3187
@johnmorrell3187 5 күн бұрын
Curve fitting is optimization so I'd say the two explanations are equivalent. While it's true that a more efficient method -> longer training -> better behavior, it's also true that if compute and time really were not a limiting factor then these less efficient methods would give better final performance.
@kubaissen
@kubaissen 3 ай бұрын
Nice vid thx
@craftydoeseverything9718
@craftydoeseverything9718 11 сағат бұрын
This was genuinely such a great video. I honestly feel like I could come away from this video and implement an image generator myself :) /gen
@zacklee5787
@zacklee5787 5 күн бұрын
Not sure I agree with some of your analysis here. The strength of diffision models doesn't come from the lower depedence of objects/pixels the model generates at once. In fact, as you mention, the model actually predicts a whole image, in practice, at every step. Even when you use the trick of predicting the noise, the noise is unintuitively not random, that is, not randomly generated, but actually depends completely on the noise or lack there of in the input. It is after all equivalent to predicting the whole image. The real strength comes from the incremental nature, that is, a step of the model further down the line can "fix" a mistake it made previously by interpreting the previous generation as noise. In the space of all say 1024x1024 pixel value combinations, there is a manifold (essentially a subset of close together images) of all target images we want to generate. The diffusion model learns to take incremental steps toward that subset of "reasonable" images from any random starting point.
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
The noise is absolutely randomly generated. The reason the model can predict the noise (or equivalently image) is because it receives both the noise and image as input. If it was the case that the incremental nature helped, then I would expect diffusion models to generate higher quality outputs than auto-regressors, but this isn't the case. Auto-regressors generate higher quality outputs (e.g. arxiv.org/abs/2205.13554 ), they just take longer to run. If it was the case that NN are unable to give correct predictions on the first go, we would see the opposite, that diffusion models can correct previous generations and thereby achieve higher quality. Also see LLM which have no difficulty generating perfect outputs in one pass. Diffusion models only learn to take steps toward the data distribution starting at the standard normal distribution (origin).
@frommarkham424
@frommarkham424 10 күн бұрын
That was exactly how i guessed they did
@HyperFocusMarshmallow
@HyperFocusMarshmallow 8 күн бұрын
A funny thing about watching a video like this is that you see an artificial neural network produce an image and then you have another layer of neural network in the brain that tries to figure out if it was a good match or not. The so called “blurry noise” could in principle look like a good match to someone and a bad match to someone else depending on how their own categorization works. It could also be good for everyone and bad for everyone of course or some arbitrary mix along that scale. The point is that “looks like blury noise” risks being a quite unobjective statement. I mean, people see images in the clouds and so on.
@IceMetalPunk
@IceMetalPunk 9 күн бұрын
And the newest/upcoming models seem to be tending more towards diffusion Transformers, which from my understanding is effectively a Transformer autoencoder with a diffusion model plugged in, applying diffusion directly to the latent space embeddings. Is that correct?
@Blooper1980
@Blooper1980 11 күн бұрын
Finally I understand!
@MilesBellas
@MilesBellas 10 күн бұрын
via Pi "Diffusion models and auto-regressive (AR) models are two popular approaches for generating images and other types of data. They differ in their fundamental techniques, generation time, and output quality. Here's a brief comparison: **Diffusion Models:** * Approach: Diffusion models are based on the idea of denoising images iteratively, starting from a noisy input and gradually refining it into a high-quality output. * Generation Time: Diffusion models are generally faster than AR models for image generation, especially when using optimizations like "asymmetric step" or Cascade models. * Output Quality: Diffusion models are known for generating high-quality and diverse images, especially when trained on large datasets like Stable Diffusion or DALL-E 2. They can capture various styles and generate coherent images with intricate details. **Auto-Regressive (AR) Models:** * Approach: AR models generate images pixel by pixel, conditioning each new pixel on previously generated pixels. This sequential approach makes AR models computationally expensive, especially for large images. * Generation Time: AR models tend to be slower than diffusion models due to their sequential nature. The generation time can be significantly longer for high-resolution images. * Output Quality: While AR models can produce high-quality images, they may struggle with capturing diverse styles or maintaining coherence across different image regions. They might require additional techniques, like classifier-free guidance or super-resolution, to achieve better results. In summary, diffusion models generally offer faster generation times and better output quality compared to AR models. However, both approaches have their strengths and limitations, and the choice between them depends on the specific use case, available computational resources, and desired generation speed and output quality."
@yk4r2
@yk4r2 Күн бұрын
Hey, could you kindly recommend more on causal architectures?
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@muhammadaneeqasif572
@muhammadaneeqasif572 4 күн бұрын
can you please share the code that ubused for generation of the images in the demo. it will be very helpful
@recklessroges
@recklessroges 2 күн бұрын
Could you explain why the YOLO image classify is/was so effective? Thank you.
@hjups
@hjups 3 ай бұрын
Do you have a citation that supports your claim for eps vs x0 prediction? It's true that the first sampling step with x0 tends to produce a blurry / averaged result, but that's a result of the loss function used when training DDPMs. If you were to use something more complex or another NN, then you'd have a GAN, which don't produce blurry or averaged results on a single forward pass. Also, if you examine the output of x0 = noise - eps for the first step, it's both mathematically and visually equivalent to the first x0 prediction sample - a blurry / averaged result. The same thing is also true when predicting velocity, but velocity is arguably harder for a network to predict due to the phase transition.
@alex65432
@alex65432 3 ай бұрын
Can you make a video about the loss landscape.Like what effects do different weight inits. Optimizers or architectures like resnet have.
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Thanks for the interesting suggestion! I was already planning to do a video about why neural networks generalize outside of their training set, I should be able to talk about the loss landscape in that video.
@alirezaghazanfary
@alirezaghazanfary Күн бұрын
thanks to very good video I have a question: can't we make a model that decrease the resolution of a picture (for example a 4*4 picture to a 2*2 and to 1*1 picture) and run it reverse (generate a 2*2 from 1*1 and 4*4 from 2*2) ? would this model works?
@algorithmicsimplicity
@algorithmicsimplicity 20 сағат бұрын
Yes you absolutely could, and according to this paper: arxiv.org/abs/2404.02905v1 it works pretty well.
@IsaOzer-lx7sn
@IsaOzer-lx7sn 14 сағат бұрын
I want to learn more about the causal architecture idea for auto regressors, but I can't seem to find anything about them anywhere. Do you know where I can read more about this topic?
@algorithmicsimplicity
@algorithmicsimplicity 14 сағат бұрын
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@iwaniw55
@iwaniw55 10 сағат бұрын
Hi @algorithmicsimplicity, I am curious which papers/material did you reference for the general autogressor? I cannot seem to find any info on using random spaced out pixels to predict the next batch of pixels. Any help would be appreciated. Also great videos!!!
@algorithmicsimplicity
@algorithmicsimplicity 10 сағат бұрын
It is more widely known as "any-order autoregression", see e.g. this paper arxiv.org/abs/2205.13554
@iwaniw55
@iwaniw55 10 сағат бұрын
@@algorithmicsimplicity Thank you so much! This is exactly what I was missing.
@EricPham-gr8pg
@EricPham-gr8pg 8 күн бұрын
Use lense projector and -zoom will save all the msthematical brain picking In video we use ccd cell in camera instantly illuminate LED pixel then zoom it down to tiny dot then send to ram and display on monitor by zoom factor corespond to resolutiom allow and zoom it back down when store it in time line of each coordinate and add all up with address and time then when unfold all we need is tiny dot first frame and last frame then start by last frame unfold into buffer subtract time but must adjust to phase angle of time at closest to last frame and just less tine drive with appropriate speed of each time axis so memory is so small
@quickdudley
@quickdudley 4 күн бұрын
My brain misinterpreted the title as "Why diffusers work better than autoencoders" (I believe because the noising process works rather like data augmentation)
@duytdl
@duytdl 9 күн бұрын
So why isn't diffusion better for text? Also are you saying that auto-regression is only bad because it's expensive to do (serially)? Or is diffusion fundamentally better for images?
@algorithmicsimplicity
@algorithmicsimplicity 9 күн бұрын
Auto-regression is only bad because it is slow, it produces better generations for both text and images. For text, there aren't that many tokens that you need to generate, so you can just use auto-regression: it gives better results. For images, you are forced to use something faster, and diffusion is much faster while producing nearly as good generations.
@turhancan97
@turhancan97 12 күн бұрын
Is the idea at the beginning of the video (auto regression image generation) self supervised learning?
@algorithmicsimplicity
@algorithmicsimplicity 11 күн бұрын
Technically yes, self supervised learning just means that the labels used to train the model were created automatically from the data itself, instead of by a human. So yes both auto-regression and diffusion are self-supervised learning, since they automatically create masked/noised inputs and use the clean image as labels. Though usually when people refer to self-supervised learning specifically they mean self-supervised but not generative, so things like simCLR or contrastive learning.
@turhancan97
@turhancan97 11 күн бұрын
@@algorithmicsimplicity I understand. Thanks a lot :)
@hamzaumair7909
@hamzaumair7909 Ай бұрын
I love your eplanations especially transfomers. Although this one imo could have been better, I think you are missing some ideas that should have been explained.
@algorithmicsimplicity
@algorithmicsimplicity Ай бұрын
Thanks for the feedback, any ideas in particular that you think should have been explained?
@agustinbs
@agustinbs 8 күн бұрын
This video is better than go to the MIT for machine learning degree. Man this is gold, thank you so much
@akashmody9954
@akashmody9954 3 ай бұрын
Can you recommend some sources that i can follow if i want to do deeper into diffusion models and transformers?
@akashmody9954
@akashmody9954 3 ай бұрын
I tried to go through the research papers but the math is overwhelming
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
​@@akashmody9954 If you just want to learn how to train/use them, I'd highly recommend the fast.ai course by Jeremy Howard, it will give you practical experience using them. If you want to do research/develop new methods then I'm afraid there isn't any better option than just reading the papers. Although if code is available I sometimes find it easier to just read the code than the paper lol.
@akashmody9954
@akashmody9954 3 ай бұрын
@@algorithmicsimplicity alright.....thanks a lot man, and loving your videos as always
@joshjohnson259
@joshjohnson259 3 күн бұрын
If this explanation is too advanced for me how would you recommend I learn enough to be able to grasp these concepts? Can you direct me to some content that is one level down in complexity so I can see if that would be my starting point in understanding how these models work? I don’t really have any CS background.
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
If you just want to learn how to train/use these models, I would highly recommend the fast.ai course by Jeremy Howard (course.fast.ai/ ). You can also look at 3blue1brown's videos on neural networks and transformers which are aimed at a general audience, and Andrej Karpathy's videos on implementing a transformer from scratch for a more detailed walkthrough of the models.
@klaushermann6760
@klaushermann6760 7 күн бұрын
Now we know they're not only predictors.
@sichengmao4038
@sichengmao4038 6 күн бұрын
can you explain why for diffusion model, there's no causal architecture? 16:26
@algorithmicsimplicity
@algorithmicsimplicity 6 күн бұрын
Basically its because NN layers accumulate information from multiple input features into one feature's vector. By making the layer only take in information from features before it in the AR order, you get a causal architecture with the same size as the original model. For diffusion, you could in principle make a causal architecture, but you would need to make a feature vector for every feature in every step of the noising process. i.e. the size of the model would need to be increased by a factor equal to the number of denoising steps, which isn't practical.
@sichengmao4038
@sichengmao4038 6 күн бұрын
@@algorithmicsimplicity don't quite understand why "the model size is increased by the number of denoising steps". What I imagine is, if we make an analogy to language model like Transformer, we now have a series of tokens (where each token is indeed a noisy image in the noising process), then we can still parallelize along the sequence dimension, isn't it?
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
@@sichengmao4038 You could do that, the problem is how you convert the entire image into a token. Usually in order to convert an image into a feature vector, you need to apply a full-sized neural network. So to get your noisy image tokens you need to apply a NN for each noising step.
@akashmody9954
@akashmody9954 3 ай бұрын
Can you make a video on how SORA by OpenAI works, what kind of architecture does it follow
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Unfortunately OpenAI does not publicly release details on their architectures, they only said it was a transformer based diffusion model. This thread had some speculation on the exact architecture though: threadreaderapp.com/thread/1758433676105310543.html
@assgoblin3981
@assgoblin3981 2 ай бұрын
Assgoblin approves of this content
@JoeJoeTater
@JoeJoeTater Күн бұрын
18:10 This is wrong. The average of a bunch of noisy images is a less-noisy image. (See "regression towards the mean") You'd have to normalize that averaged image.
@algorithmicsimplicity
@algorithmicsimplicity 20 сағат бұрын
Right, I should have been more careful with my usage of the word "noisy". If you average a bunch of samples from a normal distribution, the result is a sample with less variance (i.e. less noisy). What I meant to say was the probability of the average under the normal distribution is higher (i.e. the result is closer to the origin). So the average still lies within the data manifold (as opposed to images, where the average moves outside the data manifold).
@fayezsalka
@fayezsalka 8 сағат бұрын
Yes, that was very confusing to me too. The average of a bunch of random noise samples is 0.5, which is the mean. You would literally get a smooth grey image. Not “noise” image as shown in the video
@craftydoeseverything9718
@craftydoeseverything9718 11 сағат бұрын
17:58 btw, you wrote "nose", instead of "noise"
@algorithmicsimplicity
@algorithmicsimplicity 11 сағат бұрын
So I did. Surprised no-one else mentioned it yet lol.
@dubfather521
@dubfather521 6 күн бұрын
So denoising models work by predicting the clean image, and then to get the next step you noise its already clean output??? That doesn't make any sense. If it predicts the final image already why do you have to keep predicting.
@algorithmicsimplicity
@algorithmicsimplicity 6 күн бұрын
The first time it predicts the clean image, it will not produce a good image, it will produce a blurry mess (because it will average over all of the training images). You then add noise to this blurry mess and you get an image that is almost pure noise, with a little but of structure from the original blurry mess. Then you use that as input and predict a clean image again, this time the produced image will be slightly sharper, because now the model is only averaging over all inputs which are consistent with the blurry structure from the first step. You repeat this many times, at each step the produced image gets sharper because more detail is left from the previous step.
@dubfather521
@dubfather521 6 күн бұрын
@@algorithmicsimplicity ohhhhhh
@glaubherrocha2935
@glaubherrocha2935 14 сағат бұрын
a fixed pixel with random color wouldn't make it work?
@algorithmicsimplicity
@algorithmicsimplicity 14 сағат бұрын
I'm not sure what you are asking, can you elaborate?
@cognitive-carpenter
@cognitive-carpenter 10 күн бұрын
Enjoyed I think is the wrong output
@chadarmstrong7458
@chadarmstrong7458 12 күн бұрын
I didnt understand why you would predict the noise rather than the clean image. Your explanation didnt seem to be related to the problem...
@chadarmstrong7458
@chadarmstrong7458 12 күн бұрын
"You get a blurry mess again" Why is that a problem in the early iterations?
@chadarmstrong7458
@chadarmstrong7458 12 күн бұрын
"The advanage of doing it this way is that now the model output is uncertain at the later stages of the generation process" Why is that valuable? Why is that relevant to this other problem with the early stages that you are supposedly trying to solve?
@chadarmstrong7458
@chadarmstrong7458 12 күн бұрын
"The average of a bunch of different noise samples which is still valid noise" Why does that matter?
@cakep4271
@cakep4271 11 күн бұрын
I think 🤔 the main points are, 1. Predicting a clean image directly is slow, not creative, expensive. 2. So instead of predicting an image outright, just learn to "un-blur", and run it a bunch of times, cuz thats a fast process. So now, you tell it a pic of random noise is a cat, and to unblur the cat, thereby making the noise slightly more like a cat. Repeat over and over again. Eventually you have a clean image of a cat.
@banana_lemon_melon
@banana_lemon_melon 7 күн бұрын
noisy image = clean image + noise . Now NN is given a noisy image as input, and output/predict the pure noise. Then we can do: clean image = noisy image (input) - noise (prediction output) . Predicting noise is easier than predicting image directly, maybe because the noise is having gaussian/normal distribution (not explained in this video, but we know regression can perform better if the target label has gaussian/normal distribution). I'm not sure about the distribution of pixel value in the images though.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 98 М.
I Made a Neural Network with just Redstone!
17:23
mattbatwings
Рет қаралды 175 М.
How I prepare to meet the brothers Mbappé.. 🙈 @KylianMbappe
00:17
Celine Dept
Рет қаралды 46 МЛН
ХОТЯ БЫ КИНОДА 2 - официальный фильм
1:35:34
ХОТЯ БЫ В КИНО
Рет қаралды 2,1 МЛН
顔面水槽がブサイク過ぎるwwwww
00:58
はじめしゃちょー(hajime)
Рет қаралды 123 МЛН
And this year's Turing Award goes to...
15:44
polylog
Рет қаралды 87 М.
Why Photorealistic And Stylized Graphics Are The Same
35:00
How Many ERRORS Can You Fit in a Video?!
20:40
ElectroBOOM
Рет қаралды 108 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 688 М.
The Next Generation Of Brain Mimicking AI
25:46
New Mind
Рет қаралды 72 М.
But what is a neural network REALLY?
11:17
Algorithmic Simplicity
Рет қаралды 61 М.
Why it Was Almost Impossible to Put a Computer in Space
17:20
Linus Tech Tips
Рет қаралды 76 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 216 М.
How I prepare to meet the brothers Mbappé.. 🙈 @KylianMbappe
00:17
Celine Dept
Рет қаралды 46 МЛН