VQ-GAN | Paper Explanation

  Рет қаралды 17,938

Outlier

Outlier

Күн бұрын

Vector Quantized Generative Adversarial Networks (VQGAN) is a generative model for image modeling. It was introduced in Taming Transformers for High-Resolution Image Synthesis. The concept is build upon two stages. The first stage learns in an autoencoder-like fashion by encoding images into a low-dimensional latent space, then applying vector quantization by making use of a codebook. Afterwards, the quantized latent vectors are projected back to the original image space by using a decoder. Encoder and Decoder are fully convolutional. The second stage is learning a transformer for the latent space. Over the course of training it learns which codebook vectors go along together and which not. This can then be used in an autoregressive fashion to generate before unseen images from the data distribution.
#deeplearning #gan #generative # vqgan
0:00 Introduction
0:42 Idea & Theory
9:20 Implementation Details
13:37 Outro
Further Reading:
• VAE: towardsdatascience.com/unders...
• VQVAE: arxiv.org/pdf/1711.00937.pdf
• Why CNNS are invariant to sizes: www.quora.com/How-are-variabl...
• NonLocal NN: arxiv.org/pdf/1711.07971.pdf
• PatchGAN: arxiv.org/pdf/1611.07004.pdf
PyTorch Code: github.com/dome272/VQGAN
Follow me on instagram lol: / dome271

Пікірлер: 41
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Really cool video! 😎Can't wait for the next one.
@NoahElRhandour
@NoahElRhandour 2 жыл бұрын
omg u here??? i know u from your videos. thats so cool!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
@@NoahElRhandour Haha, I can only reply with: omg, u recognize me??? That is so cool! Yes, I am here. I have to keep a close eye on the competition! 😆
@NoahElRhandour
@NoahElRhandour 2 жыл бұрын
@@AICoffeeBreak i see :D
@khan.saqibsarwar
@khan.saqibsarwar 29 күн бұрын
That's a really nice video. You summarised so much information in a concise video. And the explanations were crystal clear. Thanks a lot.
@felixvgs.9840
@felixvgs.9840 2 жыл бұрын
What an amazing video. Please keep up the great work! :)
@reasoning9273
@reasoning9273 6 ай бұрын
By far the best video on VQVAE. Great job, outlier!
@aratasaki
@aratasaki Жыл бұрын
Incredible video! Can't tell you how much clearer everything is now. Looking forward to the future of your channel!
@outliier
@outliier Жыл бұрын
Thats so nice to hear and motivational. The next video is in the making already about CrossAttention
@code4AI
@code4AI Жыл бұрын
Excellent visualization for this smooth transition from VQVAE -> VQGAN (focus on main idea first and details second). 10/10
@devashishprasad1509
@devashishprasad1509 Жыл бұрын
This is such a great channel!!!! Why didn't I find it earlier? Thanks a lot for the great work...
@rezarawassizadeh4601
@rezarawassizadeh4601 2 жыл бұрын
after three days of struggling with the paper, I find this amazing explanation for VQ-GAN.
@smbonilla
@smbonilla 11 ай бұрын
Your videos are great! Super clearly explained :) Thanks!!
@NoahElRhandour
@NoahElRhandour 2 жыл бұрын
didaktisch, visuell und inhaltlich absolut insane, dicke probs
@joanrodriguez6212
@joanrodriguez6212 2 жыл бұрын
that made some clicks in understanding! thanks a lot
@melisakilic726
@melisakilic726 2 жыл бұрын
So excited for the next one!
@mchahhou
@mchahhou 2 жыл бұрын
awesome!! More of this please.
@baothaiba7099
@baothaiba7099 Жыл бұрын
Great work !!!!
@igorvaz6055
@igorvaz6055 Жыл бұрын
Nice explanation and visualizations!
@filipequincas1485
@filipequincas1485 Жыл бұрын
Brilliantly explained
@alexandterfst6532
@alexandterfst6532 Жыл бұрын
Incredible videos
@tiln8455
@tiln8455 2 жыл бұрын
Thank you for this video, now I can be better
@Paul-wk7rp
@Paul-wk7rp 2 жыл бұрын
Very cool video
@AIwithAniket
@AIwithAniket Жыл бұрын
great video
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
Thank you so much for the explanation Hopefully one can now go ahead with clip and create free version of DALL-E like text-to-image models
@maralzarvani8154
@maralzarvani8154 Жыл бұрын
cool!
@yendar2806
@yendar2806 2 жыл бұрын
Ich liebe dich Mathemann❤️
@sourabhpatil9406
@sourabhpatil9406 2 жыл бұрын
Crisp Explanation! I would request you to talk little bit slower, it would be really helpful. Keep up the good work.
@raeeskhan9058
@raeeskhan9058 Жыл бұрын
you are truly an outlier!
@saulcanoortiz7902
@saulcanoortiz7902 4 ай бұрын
Hey! Really great video:) I have one question. Imagine you want to use a diffusion model to learn image-to-image translation, more specifically, from segmentation masks to synthetic images. Then, you can have a tool to create images from hand-painted segmentation masks, and then, you can augment a dataset and see if state-of-the-art segmentation networks trained with the augmented dataset improve its performance. Do you know a diffusion model for this image-to-image translation task with some explanations and available repos?
@DollyNipples
@DollyNipples Жыл бұрын
Those pictures that were generated with VQGAN are surprisingly coherent. How do you do that?
@TheAero
@TheAero 9 ай бұрын
I can't find the VQGan Paper!
@JeavanCooper
@JeavanCooper Ай бұрын
The strange patten in the reconstructed image and the generated image is likely to be caused by the perceptual loss, I have no idea why but the disappears when I take the perceptual loss away.
@MrArtod
@MrArtod Жыл бұрын
How do we decide on what goes to the codebook? Is it filled with random vectors?
@rikki146
@rikki146 Жыл бұрын
It seems to be the case and they are to be converged over the course of training
@rikki146
@rikki146 Жыл бұрын
Why make 2 loss functions with sg instead of optimizing ||E(x) -z_q||_2^2 directly?
@readbyname
@readbyname Ай бұрын
Hey great video. Can you tell me why random sampling of codebook vectors doesn't generate a meaningful images. In Vae we random sample from std gaussian, why the same doesn't work for vq auto encoders.
@outliier
@outliier Ай бұрын
Because in a VAE you only predict mean and standard deviation. Sampling this is easier. Sampling the codebook vectors happens independently and this is why the output doesn‘t give a meaningful output.
@user-mh8pl5wd1s
@user-mh8pl5wd1s Жыл бұрын
개쩐다
@yassinesafraoui
@yassinesafraoui Жыл бұрын
Hmm isn't trying to train the whole network ( decoder and encoder) using the discriminator just too complicated and would result in a loss function that's so complex that using the gradient descent to minimize it would be inefficient? I mean wouldn't it take a longer time to train? Hence the following idea, why not use separate discriminators to train the decoder and the encoder separately. Yes it would be quite a lot more complicated than this to design but I guess it's worth giving a shot 😀 If someone knows if something like this is already done( cuz I have a feeling it probably is), may he enlighten me, thanks
@idealintelligence7009
@idealintelligence7009 Жыл бұрын
Thanks boy :) Please speak louder in the video your voice is low.:)
VQ-GAN | PyTorch Implementation
38:13
Outlier
Рет қаралды 14 М.
VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained
34:38
Aleksa Gordić - The AI Epiphany
Рет қаралды 42 М.
Пробую самое сладкое вещество во Вселенной
00:41
Increíble final 😱
00:37
Juan De Dios Pantoja 2
Рет қаралды 110 МЛН
Was ist im Eis versteckt? 🧊 Coole Winter-Gadgets von Amazon
00:37
SMOL German
Рет қаралды 13 МЛН
Mathematica: How to set three different colors for mesh style?
2:06
VQ-GAN: Taming Transformers for High-Resolution Image Synthesis | Paper Explained
30:01
Aleksa Gordić - The AI Epiphany
Рет қаралды 19 М.
If LLMs are text models, how do they generate images? (Transformers + VQVAE explained)
17:37
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 230 М.
Variational Autoencoder - VISUALLY EXPLAINED!
35:33
Kapil Sachdeva
Рет қаралды 11 М.
Variational Autoencoders
15:05
Arxiv Insights
Рет қаралды 482 М.
Diffusion Models | PyTorch Implementation
22:26
Outlier
Рет қаралды 80 М.
В России ускорили интернет в 1000 раз
0:18
Короче, новости
Рет қаралды 728 М.
Хотела заскамить на Айфон!😱📱(@gertieinar)
0:21
Взрывная История
Рет қаралды 4,4 МЛН
iPhone 12 socket cleaning #fixit
0:30
Tamar DB (mt)
Рет қаралды 55 МЛН