Пікірлер
@hieuaovan7101
@hieuaovan7101 5 күн бұрын
love to see more good explaination for other model, your explaination is soo good
@MrScorpianwarrior
@MrScorpianwarrior 6 күн бұрын
Hey! I am start my CompSci Masters program in the Fall, and just wanted to say that I love this video. I've never really had time to sit down and learn PyTorch, so the brevity of this video is greatly appreciated! It gives me a fantastic starting point that I can tinker around with, and I have an idea on how I can apply this in a non-conventional way that I haven't seen much research on... Thanks again!
@outliier
@outliier 6 күн бұрын
Love to hear that Good luck on your journey!
@bhavyaruparelia7431
@bhavyaruparelia7431 9 күн бұрын
Your explanations are simply great! I do recommend you to return back to KZfaq covering latest papers in this field :)
@WendaoZhao
@WendaoZhao 10 күн бұрын
one CRAZY thing to take from this code (and video) GREEK LETTERS ARE CAN BE USED AS VARIABLE NAME IN PYTHON
@astrophage381
@astrophage381 10 күн бұрын
These implementation videos are marvelous. You really should do more of them. Big fan of your channel!
@pratyanshvaibhav
@pratyanshvaibhav 12 күн бұрын
The Under rated OG channel
@freerockneverdrop1236
@freerockneverdrop1236 12 күн бұрын
At 13:20, the formula is not correct. It is not approximate, not strictly equal. It should have been made clear.
@iceinmylean3947
@iceinmylean3947 15 күн бұрын
Great video! one question: at 22:40 you say "the authors decided to use a simple mean-squared error ..". That part isn't clear to me, at this point we are already considering the loss, we need to minimize the given KL divergence. Why is there a new loss being introduced at this point and how is that justified?
@ShahNawazKhan-jz8wl
@ShahNawazKhan-jz8wl 15 күн бұрын
insance
@ParhamEftekhar
@ParhamEftekhar 16 күн бұрын
Awesome video.
@ParhamEftekhar
@ParhamEftekhar 16 күн бұрын
Great explanation. Thanks.
@erenenadream
@erenenadream 17 күн бұрын
Nice explanation handsome dude, you just got a new subscriber.
@blancanthony9992
@blancanthony9992 18 күн бұрын
So far the best model, fastest, high quality image generator on my 3070 gpu. Very very great !!! I used transformers encoders for "denoising". 95% noise on first iteration, no pure noise. No signal in the denoiser's inputs. tested on cifar 100 ! temperature=0.7, top k=40 only 4 steps for denoising !!! Very impressive ! It is the first time i get so confident about the power of a generative model !!!
@NinadDaithankar5
@NinadDaithankar5 19 күн бұрын
Amazing video; thanks a lot for going in depth on the math with simplified animations!
@utkarshujwal3286
@utkarshujwal3286 22 күн бұрын
Great video buddy, if you can share some more resources to understand the underlying math, that would be great.
@outliier
@outliier 22 күн бұрын
Most of the papers I linked have a good amount of the maths, however often without detailed explanations. There are some good blog posts as well you can easily find on diffusion models. I will soon have another video on this topic that should explain stuff much better too
@rma1563
@rma1563 23 күн бұрын
Appreciate the effort you put into this. You definitely can teach. If only I have a brain to understand math... still got some bits here and there. Thanks
@JidongLi-lb3zt
@JidongLi-lb3zt 25 күн бұрын
thanks for your detailed introduction
@khan.saqibsarwar
@khan.saqibsarwar 27 күн бұрын
That's a really nice video. You summarised so much information in a concise video. And the explanations were crystal clear. Thanks a lot.
@fcw1310
@fcw1310 29 күн бұрын
Usually the KL divergence is expressed as DKL(q||p)=q*log(q/p), but in the slice @16:48, DKL(q||p)=log(q/p), Why q is ignored here?
@fcw1310
@fcw1310 Ай бұрын
Thanks for such amazing illustration for Diffusion. One question is about the equation in slice @ 13:16, how to get t-2 and t-3? x_t=sqrt(a_t)*x_t-1+sqrt(1-a_t)*e x_t-1=sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e x_t=sqrt(a_t)*[sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e]+sqrt(1-a_t)*e=sqrt(a_t*a_t-1)*x_t-2+[sqrt(a_t-a_t*a_t-1)+sqrt(1-a_t)]*e The rightmost term doesn't equal or close to sqrt(1-a_t*a_t-1)*e Dis I misunderstand something? Thanks again. @Outlier
@subashchandrapakhrin3537
@subashchandrapakhrin3537 Ай бұрын
Very Bad Video
@outliier
@outliier Ай бұрын
:(
@user-hm6sh6pl7r
@user-hm6sh6pl7r Ай бұрын
Thanks for the explanation, it's awesome! But I have a question. In cross attention, if we set the text as V, the final attention matrix could be viewed as a weighted sum of each word in V itself (the "weighted" part comes from the Q, K similarity). If I understand correctly, the final attention matrix should contain the values in the text domain, why can we multiply by a W_out projection and get the result in the image domain (add it to the original image)? Will it make more sense if we set the text condition as Q, and the image as K, V?
@outliier
@outliier Ай бұрын
If the text conditioning is q then it would not have the same shape as your image. So q needs to be the image
@mousamustafa1042
@mousamustafa1042 Ай бұрын
U really liked that you showed the derivation in an understandable way
@raphaelfeigl1209
@raphaelfeigl1209 Ай бұрын
Amazing explanation thanks a lot! Minor improvement suggestion: add a pop-protection to your microphone :)
@tomasjavurek1030
@tomasjavurek1030 Ай бұрын
I think it is not exactly true statement that N(mu, sigma) = mu + sigma*N(0, 1). Just try that transformation, mu plays a role of translation in the value axis. However, what is correct, that if you sample from the left side, it acts the same as if you sample from the right side. I am pointing this out because I got stuck with that for a while. But I still also might got it completely wrong.
@tomasjavurek1030
@tomasjavurek1030 Ай бұрын
Also later, when working with alphas, there's probably just approx. equal operation restrictred just to the first order of taylor expansion.
@EvanSpades
@EvanSpades Ай бұрын
Love this - what a fantastic achievement!
@mtolgacangoz
@mtolgacangoz Ай бұрын
Brilliant work!
@shojintam4206
@shojintam4206 Ай бұрын
11:57
@jefersongallo8033
@jefersongallo8033 Ай бұрын
This is a really great video, thanks for your big effort explaining!
@akkokagari7255
@akkokagari7255 Ай бұрын
Wonderful explanation! Not sure if this is in the original papers, but I find it very odd that there is no nonlinear function after V and before W_out. It seems like a waste to me since Attention@V is itself a linear function, so w_out wont necessarily change content of the data beyond what Attention@V already would have done through training.
@akkokagari7255
@akkokagari7255 Ай бұрын
Whoops I mean the similarity matrix not Attention
@JeavanCooper
@JeavanCooper Ай бұрын
The strange patten in the reconstructed image and the generated image is likely to be caused by the perceptual loss, I have no idea why but the disappears when I take the perceptual loss away.
@ChristProg
@ChristProg Ай бұрын
Thank you so much . But please i prefer that you go to the maths and operations more detailly being training of Würstchen 🎉🎉 thank you
@RyanHelios
@RyanHelios Ай бұрын
really nice video, helps me understand a lot❗
@mtolgacangoz
@mtolgacangoz Ай бұрын
Great video!! At 13:34, does multiplying with a_0 correct?
@user-kx1nm3vw5s
@user-kx1nm3vw5s Ай бұрын
best explanation
@siddharthshah9316
@siddharthshah9316 Ай бұрын
This is an amazing video 🔥
@gintonic6204
@gintonic6204 Ай бұрын
12:12 does anyone understand here why when \beta is linear, \sqrt{1-\beta} is linear as well?
@sciencerz7460
@sciencerz7460 Ай бұрын
the statement at 15:33 isnt right ... is it ? cause i have a counter f(x) = x^2 , g(x) = -x^2 here f(x) >= g(x) but thier derivatives are negatives of each other. Please help i dont really understand the concept of ELBO
@KienLe-md9yv
@KienLe-md9yv Ай бұрын
At inference. Input of State A( VQGAN decoder) is discrete latents. Continuous latents needs to be quantize to discrete latents( discrete latents is also choosen from codebook, by which vector in Continuous latents nearest to vector in codebook). But Output of State B is Continuous latents. And Output of State B is directly for Input of State A..... if it right ? How State A( VQGAN decoder) handle Continuous latents . I check VQGAN paper and this Wurchen paper. That is not clear. Please help me that. Thank you
@outliier
@outliier Ай бұрын
The VQGAN decoder can also decode continuous latents. It‘s as easy as that.
@KienLe-md9yv
@KienLe-md9yv Ай бұрын
So, apparently, it sounds like Wurchen is exactly at Stage C. am i right?
@outliier
@outliier Ай бұрын
What do you mean exactly?
@readbyname
@readbyname Ай бұрын
Hey great video. Can you tell me why random sampling of codebook vectors doesn't generate a meaningful images. In Vae we random sample from std gaussian, why the same doesn't work for vq auto encoders.
@outliier
@outliier Ай бұрын
Because in a VAE you only predict mean and standard deviation. Sampling this is easier. Sampling the codebook vectors happens independently and this is why the output doesn‘t give a meaningful output.
@Bhllllll
@Bhllllll 2 ай бұрын
How did you manage to get 128 A100 for 3 weeks? I think the cost is about 100k USD for one run. Assuming you did multiple iterations, the overall cost can be easily 200k for this project.
@ashimdahal182
@ashimdahal182 2 ай бұрын
Just completed writing a 24 paged handwritten note based on this video and a few other sources
@outliier
@outliier 2 ай бұрын
Wanna share it? :D
@TheSlepBoi
@TheSlepBoi 2 ай бұрын
Amazing explanation and thank you for taking the time to properly visualize everything
@Gruell
@Gruell 2 ай бұрын
Sorry if I am misunderstanding, but at 19:10, shouldn't the code be: "uncond_predicted_noise = model(x, t, None)" instead of "uncond_predicted_noise = model(x, labels, None)" Also, according to the CFG paper's formula, shouldn't the next line be: "predicted_noise = torch.lerp(predicted_noise, uncond_predicted_noise, -cfg_scale)" under the definition of lerp? One last question: have you tried using L1Loss instead of MSELoss? On my implementation, L1 Loss performs much better (although my implementation is different than yours). I know the ELBO term expands to essentially an MSE term wrt predicted noise, so I am confused as to why L1 Loss performs better for my model. Thank you for your time.
@Gruell
@Gruell 2 ай бұрын
Great videos by the way
@Gruell
@Gruell 2 ай бұрын
Ah, I see you already fixed the first question in the codebase
@duduwe8071
@duduwe8071 2 ай бұрын
Hey @Outlier , on 12:44 looks like you mistakenly use "a" instead of "alpha" notation symbol inside 'product notation (Pi Notation)'. Since you mentioned the example multiplication below using "alpha notation". e.g. t = 8 "alpha_8" = "alpha_1" x "alpha_2" x "alpha_3" x "alpha_4" x "alpha_5" x "alpha_6" x "alpha_7" x "alpha_8" ======================================================= Is it intentional, though ? Please let me know. Thanks
@coy457
@coy457 2 ай бұрын
This is dumb, but can anyone explain why when beta increases linearly, the square root of 1 - beta decreases linearly, at 12:13? Shouldn't it have some curve to it, given the square root?
@attilakun7850
@attilakun7850 14 күн бұрын
Type these 2 formulas in desmos: \beta=\frac{\left(0.02-0.0001 ight)}{999}x \sqrt{1-\beta} You can see that \sqrt{1-\beta} is indeed non-linear but it curves very-very slightly in the plotted domain. You have to zoom out the X axis a lot to see the curvature.
@coy457
@coy457 13 күн бұрын
@@attilakun7850 ahhh tysm!
@antongolles8896
@antongolles8896 2 ай бұрын
@22:32 ur missing a bar over the alpha on the bottom line. Plz correct me if im wrong
@outliier
@outliier 2 ай бұрын
You are probably right 🤔
@UnbelievableRam
@UnbelievableRam 2 ай бұрын
Hi! Can you please explain why the output is getting two stitched images?
@outliier
@outliier 2 ай бұрын
What do you mean with two stitched images?
@arka-h274
@arka-h274 2 ай бұрын
How did the KL divergence expand to log(q/p)? You yourself mentioned it to be the integral of q*log(q/p) for D_kl(q||p) Perhaps too much of a simplification.