Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models

  Рет қаралды 1,225

Stanford Online

Stanford Online

Ай бұрын

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

Пікірлер: 1
@CPTSMONSTER
@CPTSMONSTER 26 күн бұрын
5:40 Contrastive divergence, changes in gradient of log partition function wrt theta easy to evaluate if samples from the model can be accessed 8:25 Training energy based models by maximum likelihood is feasible to the extent that samples can be generated, MCMC 14:00? MCMC methods, detailed balance condition 22:00? log x=x' term 23:25? Computing log-likelihood is easy for EBMs 24:15 Very expensive to train EBMs, every training data point requires a sample to be generated from the model, generating sample involves Langevin MCMC with 1000 steps 37:30 Derivative of KL divergence is Fisher divergence, two densities convolved with Gaussian noise, derivative wrt size of noise is Fisher divergence 38:40 Score matching, theta is continuous 47:10 Score matching derivation, independent of p_data 51:15? Equivalent to Fisher divergence 52:35 Interpretation of loss function, first term makes data points stationary (local minima or maxima) to minimize log-likelihood, small perturbations in the data points should not increase the log-likelihood by a lot, second term makes data points local maxima not minima 55:30? Backprop n times to calculate Hessian 56:20 Proved equivalence to Fisher divergence, infinite data would yield the exact data distribution 57:45 Fitting EBM, similar flavor to GANs. Instead of contrasting data to samples from the model, contrast to noise 1:00:10 Instead of setting the discriminator to some neural network, define it with the same form as the optimal discriminator. Not feeding x arbitrarily into neural network, evaluate the likelihoods under the model p_theta and noise distributions. The optimal p_theta must match p_data, due to the pre-defined form of the discriminator. Parameterize p_theta with EBM. (In a GAN setting, the discriminator itself would be parameterized by a neural network.) 1:03:00? Classifiers in noise correction 1:11:30 Loss function is independent of sampling, getting EBM and sampling still requires MCMC Langevin steps 1:19:00 GAN vs NCE, generator trained in GAN, noise distribution fixed in NCE but need to evaluate likelihood of noise 1:22:20 Noise contrastive estimation, where the noise distribution a flow that is learned adversarially
Cosmology Lecture 1
1:35:47
Stanford
Рет қаралды 1,1 МЛН
Cute Barbie Gadget 🥰 #gadgets
01:00
FLIP FLOP Hacks
Рет қаралды 55 МЛН
I Built a Shelter House For myself and Сat🐱📦🏠
00:35
TooTool
Рет қаралды 34 МЛН
Stanford CS25: V4 I Hyung Won Chung of OpenAI
36:31
Stanford Online
Рет қаралды 52 М.
Stanford CS25: V3 I Retrieval Augmented Language Models
1:19:27
Stanford Online
Рет қаралды 141 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 186 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 267 М.
178 - An introduction to variational autoencoders (VAE)
17:39
DigitalSreeni
Рет қаралды 42 М.
General Relativity Lecture 1
1:49:28
Stanford
Рет қаралды 3,9 МЛН
Stanford CS25: V4 I Aligning Open Language Models
1:16:21
Stanford Online
Рет қаралды 17 М.
The Math Behind Generative Adversarial Networks Clearly Explained!
17:04
Cute Barbie Gadget 🥰 #gadgets
01:00
FLIP FLOP Hacks
Рет қаралды 55 МЛН