VQ-GAN | Paper Explanation

Рет қаралды 17,938

Күн бұрын

Vector Quantized Generative Adversarial Networks (VQGAN) is a generative model for image modeling. It was introduced in Taming Transformers for High-Resolution Image Synthesis. The concept is build upon two stages. The first stage learns in an autoencoder-like fashion by encoding images into a low-dimensional latent space, then applying vector quantization by making use of a codebook. Afterwards, the quantized latent vectors are projected back to the original image space by using a decoder. Encoder and Decoder are fully convolutional. The second stage is learning a transformer for the latent space. Over the course of training it learns which codebook vectors go along together and which not. This can then be used in an autoregressive fashion to generate before unseen images from the data distribution.
#deeplearning #gan #generative # vqgan
0:00 Introduction
0:42 Idea & Theory
9:20 Implementation Details
13:37 Outro
Further Reading:
• VAE: towardsdatascience.com/unders...
• VQVAE: arxiv.org/pdf/1711.00937.pdf
• Why CNNS are invariant to sizes: www.quora.com/How-are-variabl...
• NonLocal NN: arxiv.org/pdf/1711.07971.pdf
• PatchGAN: arxiv.org/pdf/1611.07004.pdf
PyTorch Code: github.com/dome272/VQGAN
Follow me on instagram lol: / dome271

Пікірлер: 41

@AICoffeeBreak 2 жыл бұрын

Really cool video! 😎Can't wait for the next one.

@NoahElRhandour 2 жыл бұрын

omg u here??? i know u from your videos. thats so cool!

@AICoffeeBreak 2 жыл бұрын

@@NoahElRhandour Haha, I can only reply with: omg, u recognize me??? That is so cool! Yes, I am here. I have to keep a close eye on the competition! 😆

@NoahElRhandour 2 жыл бұрын

@@AICoffeeBreak i see :D

@khan.saqibsarwar 29 күн бұрын

That's a really nice video. You summarised so much information in a concise video. And the explanations were crystal clear. Thanks a lot.

@felixvgs.9840 2 жыл бұрын

What an amazing video. Please keep up the great work! :)

@reasoning9273 6 ай бұрын

By far the best video on VQVAE. Great job, outlier!

@aratasaki Жыл бұрын

Incredible video! Can't tell you how much clearer everything is now. Looking forward to the future of your channel!

@outliier Жыл бұрын

Thats so nice to hear and motivational. The next video is in the making already about CrossAttention

@code4AI Жыл бұрын

Excellent visualization for this smooth transition from VQVAE -> VQGAN (focus on main idea first and details second). 10/10

@devashishprasad1509 Жыл бұрын

This is such a great channel!!!! Why didn't I find it earlier? Thanks a lot for the great work...

@rezarawassizadeh4601 2 жыл бұрын

after three days of struggling with the paper, I find this amazing explanation for VQ-GAN.

@smbonilla 11 ай бұрын

Your videos are great! Super clearly explained :) Thanks!!

@NoahElRhandour 2 жыл бұрын

didaktisch, visuell und inhaltlich absolut insane, dicke probs

@joanrodriguez6212 2 жыл бұрын

that made some clicks in understanding! thanks a lot

@melisakilic726 2 жыл бұрын

So excited for the next one!

@mchahhou 2 жыл бұрын

awesome!! More of this please.

@baothaiba7099 Жыл бұрын

Great work !!!!

@igorvaz6055 Жыл бұрын

Nice explanation and visualizations!

@filipequincas1485 Жыл бұрын

Brilliantly explained

@alexandterfst6532 Жыл бұрын

Incredible videos

@tiln8455 2 жыл бұрын

Thank you for this video, now I can be better

@Paul-wk7rp 2 жыл бұрын

Very cool video

@AIwithAniket Жыл бұрын

great video

@prabhavkaula9697 2 жыл бұрын

Thank you so much for the explanation Hopefully one can now go ahead with clip and create free version of DALL-E like text-to-image models

@maralzarvani8154 Жыл бұрын

cool!

@yendar2806 2 жыл бұрын

Ich liebe dich Mathemann❤️

@sourabhpatil9406 2 жыл бұрын

Crisp Explanation! I would request you to talk little bit slower, it would be really helpful. Keep up the good work.

@raeeskhan9058 Жыл бұрын

you are truly an outlier!

@saulcanoortiz7902 4 ай бұрын

Hey! Really great video:) I have one question. Imagine you want to use a diffusion model to learn image-to-image translation, more specifically, from segmentation masks to synthetic images. Then, you can have a tool to create images from hand-painted segmentation masks, and then, you can augment a dataset and see if state-of-the-art segmentation networks trained with the augmented dataset improve its performance. Do you know a diffusion model for this image-to-image translation task with some explanations and available repos?

@DollyNipples Жыл бұрын

Those pictures that were generated with VQGAN are surprisingly coherent. How do you do that?

@TheAero 9 ай бұрын

I can't find the VQGan Paper!

@JeavanCooper Ай бұрын

The strange patten in the reconstructed image and the generated image is likely to be caused by the perceptual loss, I have no idea why but the disappears when I take the perceptual loss away.

@MrArtod Жыл бұрын

How do we decide on what goes to the codebook? Is it filled with random vectors?

@rikki146 Жыл бұрын

It seems to be the case and they are to be converged over the course of training

@rikki146 Жыл бұрын

Why make 2 loss functions with sg instead of optimizing ||E(x) -z_q||_2^2 directly?

@readbyname Ай бұрын

Hey great video. Can you tell me why random sampling of codebook vectors doesn't generate a meaningful images. In Vae we random sample from std gaussian, why the same doesn't work for vq auto encoders.

@outliier Ай бұрын

Because in a VAE you only predict mean and standard deviation. Sampling this is easier. Sampling the codebook vectors happens independently and this is why the output doesn‘t give a meaningful output.

@user-mh8pl5wd1s Жыл бұрын

개쩐다

@yassinesafraoui Жыл бұрын

Hmm isn't trying to train the whole network ( decoder and encoder) using the discriminator just too complicated and would result in a loss function that's so complex that using the gradient descent to minimize it would be inefficient? I mean wouldn't it take a longer time to train? Hence the following idea, why not use separate discriminators to train the decoder and the encoder separately. Yes it would be quite a lot more complicated than this to design but I guess it's worth giving a shot 😀 If someone knows if something like this is already done( cuz I have a feeling it probably is), may he enlighten me, thanks