Tom Goldstein: "What do neural loss surfaces look like?"

  Рет қаралды 17,958

Institute for Pure & Applied Mathematics (IPAM)

Institute for Pure & Applied Mathematics (IPAM)

Күн бұрын

New Deep Learning Techniques 2018
"What do neural loss surfaces look like?"
Tom Goldstein, University of Maryland
Abstract: Neural network training relies on our ability to find “good” minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple “filter normalization” method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Using this method, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Institute for Pure and Applied Mathematics, UCLA
February 8, 2018
For more information: www.ipam.ucla.edu/programs/wor...

Пікірлер: 15
@hxhuang9306
@hxhuang9306 5 жыл бұрын
As a noob I just want to see what loss functions in more complex networks look like. Was not dissappointed.
@dshahrokhian
@dshahrokhian 4 жыл бұрын
Great Video Summary of all the work in the Maryland lab!
@AoibhinnMcCarthy
@AoibhinnMcCarthy 3 жыл бұрын
Great lecture! Very clear of explaining the influence of loss function from networks
@vtrandal
@vtrandal 9 күн бұрын
@18:00 the speaker, Tom Goldstein is answering a questioin: Is this the whole error surface? His answer contains good news and bad news. It's good news insofar as it's pretty far relative to the weights involved, but it adding skip connections does not convert it to a convex optimization problem. At least that's what I get from the question and the answer.
@ProfessionalTycoons
@ProfessionalTycoons 5 жыл бұрын
Amazing video
@XahhaTheCrimson
@XahhaTheCrimson 3 жыл бұрын
This helps me a lot
@nguyendinhchung9677
@nguyendinhchung9677 2 жыл бұрын
Very good and funny videos bring a great sense of entertainment!
@dimitermilushev575
@dimitermilushev575 4 жыл бұрын
Thanks, this is a great video. Do you see any issues/fudamental differences in applying these techniques to sequence models? Is there any research doing so?
@joshuafox1757
@joshuafox1757 6 жыл бұрын
How much computational power does it cost to evaluate the loss landscape using this method, compared to a more naive method?
@user-ke5tu6ys7z
@user-ke5tu6ys7z Жыл бұрын
Thank you professor!! I love this video. 38:45 why do we find saddle point? How do we apply saddle point for research?
@aaAa-vq1bd
@aaAa-vq1bd Жыл бұрын
Saddle points identify the points where directions are both upwards and downwards. But why are they useful? Good question. I looked it up: “one of the reasons neural network research was abandoned (once again) in the late 90s was *because the optimization problem is non-convex*. The realization from the work in the 80s and 90s that neural networks have an exponential number of local minima, along with the breakout success of kernel machines, also led to this downfall, as did the fact that networks may get stuck on poor solutions. Recently we have evidence that the issue of non-convexity may be a non-issue, which changes its relationship vis-a-vis neural networks.” What does this mean? Well, say we want to average the values in some neighborhood which is in n-dimensional space. But we can’t just compute the Gaussian kernel because it becomes (potentially exponentially) worse as we go up dimensions. So we need to unfold the manifold to a 2d Euclidean space (a flat coordinate system). What’s the issue? Local minima (areas which look like minima in a restricted region of a function) can get our averaging machine stuck as it applies a stochastic gradient descent algorithm. And there are exponentially many local minima in our neural network, in general, so we are worried that there’s no guarantee of optimization with neural networks at all. Well shit. The thing is though, that the critical points of high-dimensional surfaces for almost all of the trajectory are saddle points, not local minima. Saddle points pose no problem to stochastic gradient descent. And if there is any randomness in our data it’s exponentially likely that all the local minima are close to the global minima. Therefore local minima are not a problem. Basically, saddle points are the highly prevalent critical points in parameter space that don’t pose a problem for the algorithms and architecture we want to use. Local minima do pose a problem but we’ve found that in high dimensions they are only in certain places (near global minima). So you can’t use saddle points in your data for anything special, it’s just that a lot of algorithms (like Newton, gradient descent and quasi-Newton) think saddle points are local minima and thus get stuck much more often than they should. (A side note- there’s something called “saddle-free Newton” which was written about in 2014 but it’s been seen that SGD works just as well without needing to compute a Hessian for a lot of parameters.) Hope that helps a bit.
@DonghuiSun
@DonghuiSun 5 жыл бұрын
Interesting research. Does the code have been shared?
@onetonfoot
@onetonfoot 5 жыл бұрын
github.com/tomgoldstein/loss-landscape
@user-xo9on2of4k
@user-xo9on2of4k 6 жыл бұрын
can i get pdf file? thks
Stanley Osher: "New Techniques in Optimization and Their Applications to Deep Learning..."
34:08
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 2,4 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 296 М.
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 15 МЛН
My little bro is funny😁  @artur-boy
00:18
Andrey Grechka
Рет қаралды 11 МЛН
Xavier Bresson: "Convolutional Neural Networks on Graphs"
40:48
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 30 М.
Deep Ensembles: A Loss Landscape Perspective (Paper Explained)
46:32
Yannic Kilcher
Рет қаралды 22 М.
Stéphane Mallat: "Deep Generative Networks as Inverse Problems"
37:10
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 5 М.
tinyML Talks: A Practical Guide to Neural Network Quantization
1:01:20
The tinyML Foundation
Рет қаралды 23 М.
Geometric Intuition for Training Neural Networks
30:21
Seattle Applied Deep Learning
Рет қаралды 17 М.
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 340 М.
Neural Network Architectures & Deep Learning
9:09
Steve Brunton
Рет қаралды 775 М.
How to Create a Neural Network (and Train it to Identify Doodles)
54:51
Sebastian Lague
Рет қаралды 1,8 МЛН
James Zou: "Deep learning for genomics: Introduction and examples"
49:01
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 20 М.
Geoffrey Hinton: "Does the Brain do Inverse Graphics?"
48:39
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 18 М.
#miniphone
0:16
Miniphone
Рет қаралды 3,6 МЛН
В России ускорили интернет в 1000 раз
0:18
Короче, новости
Рет қаралды 808 М.
Main filter..
0:15
CikoYt
Рет қаралды 13 МЛН
Gizli Apple Watch Özelliği😱
0:14
Safak Novruz
Рет қаралды 4,7 МЛН
YOTAPHONE 2 - СПУСТЯ 10 ЛЕТ
15:13
ЗЕ МАККЕРС
Рет қаралды 133 М.
How To Unlock Your iphone With Your Voice
0:34
요루퐁 yorupong
Рет қаралды 28 МЛН