The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Рет қаралды 19,969

Күн бұрын

Stunning evidence for the hypothesis that neural networks work so well because their random initialization almost certainly contains a nearly optimal sub-network that is responsible for most of the final performance.
arxiv.org/abs/1803.03635
Abstract:
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.
We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.
We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
Authors: Jonathan Frankle, Michael Carbin
Links:
KZfaq: / yannickilcher
Twitter: / ykilcher
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher

Пікірлер: 41

@JackofSome 4 жыл бұрын

Yannic you're spoiling us. I hope you're able to keep your pace once (if???) this virus dies down a bit.

@nbrpwng 4 жыл бұрын

This is actually reminiscent of how human brains develop from childhood to adulthood. At birth, humans have far more connections between their neurons and connections primarily die off as they learn and mature, much more than new neurons and connections are formed. And yet humans can still learn despite connection removal, and possibly because of it.

@gorgolyt 4 жыл бұрын

Great observation. That could simply be pruning, which doesn't decrease performance, and improves energy efficiency for the organism. But it could be something deeper and more important.

@sayakpaul3152 4 жыл бұрын

Thanks for the wonderfully detailed walkthrough :) It might be worth mentioning that while training neural nets it's also possible to train it in a pruning-aware fashion with all the good stuff like pruning schedules, maximum achievable sparsity, etc.

@jivan476 2 жыл бұрын

Could it be that the "winning tickets" can be identified after only a handful of training epochs instead of after a full training (e.g. 50 epochs or more)? If yes, it would mean that we can train for 3-4 epochs, prune 50% of the weights, then re-start the training on these weights only (with same initialisation as before), rinse and repeat. In theory it could allow faster training.

@wenhanzhou5826 Жыл бұрын

I think yes, because there is a paper that discusses that different weights initialization will create different local mininma in the loss landscape for the same data. What you can do is start with a really big network and a large learning rate. The network will find one of the local minima quickly, and then just start pruning to get to the lowest point of that minima.

@MrNightLifeLover 4 жыл бұрын

Very well explained, thanks! Please keep reviewing papers!

@milkteamx7183 Жыл бұрын

Amazing explanation! Thank you so much! I just looked through your channel and am excited to find that you have many of these videos. Just subscribed!

@wolfgangmitterbaur3942 2 жыл бұрын

Thanks a lot for this video. It explains essentials of the paper very good - and easy to follow for a non-native speaker, what is important as well!

@TimScarfe 4 жыл бұрын

Great video! Looking forward to having a discussion on our street talk podcast!

@jrkirby93 4 жыл бұрын

I love the idea of sparse neural nets. It feels kinda icky looking at these grossly overparameterized models that are often SOTA and thinking: "Right now, this is the best way of doing this." Pruning is good technique for finding sparse neural nets. I thought this was a great paper when I first read it. But I've been working on my own research that approaches sparse NN from the other direction. Instead of starting with fully connected layers and pruning, I start with extremely sparse layers and build it up, one edge at a time. It requires quite a different training procedure though. Instead of back-propagation and gradient decent, I take advantage of the piecewise linear properties of ReLU to guarantee a fully piecewise linear neural net. This allows me to explicitly find the optimal next best edge - and it's optimal value - in a single optimization step. I hope to finish implementing my research in the coming weeks, and would be happy to show you in more detail if you're interested.

@jepkofficial 3 жыл бұрын

What happened with this research?

@jrkirby93 3 жыл бұрын

@@jepkofficial Wow was that really 6 months ago? I still haven't finished implementing it. Hard to focus when working alone on independent research. Thanks for the reminder, I should return to that project and get it done.

@Leibniz_28 3 жыл бұрын

How's it going the research?

@laurenpinschannels 2 жыл бұрын

checking in on this again, on the off chance you didn't get distracted from this one :)

@Poof57 2 жыл бұрын

@@jrkirby93 woohoo another reminder here :P

@freemind.d2714 3 жыл бұрын

Very good one hypothesis, very make sense

@user-sh5hn2gn1k 10 ай бұрын

Hi @Yannic Kilcher! Can't we control the Random Initialization to keep almost every weight in the network (to get the most out of the original network)? Can't every weight win the lottery?

@user-sh5hn2gn1k 10 ай бұрын

Hi @Yannic Kilcher! Isn't there any possibility that the weights that are not close to zero (or very small in the magnitude), are the weights that should be pruned? Can't that be a better idea, to monitor the weights in the initial training (with complete network) and prune based upon "which weights are traveled much further in the initial training with complete network"? 🤔 Kindly enlighten on this!

@thejll 4 ай бұрын

Very interesting. Does anyone know of software that allows doing this pruning?

@user-sh5hn2gn1k 10 ай бұрын

Hi @Yannic Kilcher! It seems that the Random Initialization is very important before the pruning. Right? Because only lucky (in terms of random initialization) weights are kept after pruning. If random initialization is so bad and there is no (or very few) lucky candidate weight (after random initialization) then what to do in that case? Is there any particular Random initialization recommended by the paper or by practice? There are some of the recommended random initialization methods like Glorot or He.

@araldjean-charles3924 10 ай бұрын

For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?

@kevalan1042 3 жыл бұрын

did they check if those initial weights already tend to be relatively large ?

@chesstanay 3 ай бұрын

Where can I read more about the related finding at 17:16?

@joirnpettersen 4 жыл бұрын

What if insead of pruning the weights, you assume the low magnitude weights were initialized incorrectly, and re-train the dense network where the high-magnitude weights are kept at their inital initalization, and the low magnitude weighs get new values?

@YannicKilcher 4 жыл бұрын

I've never heard this idea. Nice, might be worth a try. I doubt you're gonna get a massive improvement, but it might be interesting to analyze whether you could find an even smaller winning hypothesis.

@HappyManStudiosTV 4 жыл бұрын

hey! have you seen uber's follow up work? they basically say that the trick is just to prune weights that are going *towards* 0, not near 0

@TimScarfe 4 жыл бұрын

HappyManStudiosTV Interesting

@jordyvanlandeghem3457 3 жыл бұрын

can you link the paper? :) thanks!

@MrSb192 3 жыл бұрын

Question: suppose we have a network N that we train up to a certain accuracy on some data, prune p% of the weights using some algorithm (one shot, imp, etc) and revert the remaining weights to the initial values. Now, is there any way to ensure that the resulting pruned network will always perform better than the original when trained for the same#iterations? I mean, is there any algorithm for pruning which can guarantee the finding of a lottery ticket within the network everytime we use it? Or is it just trial and error (which is why, I guess, the term lottery ticket is used)?

@vishwajitkumarvishnu3878 4 жыл бұрын

How do you read and understand any paper so fast? Does it come by practice or is there a way to read different sections. I want to do that. Uploading a video on how to read a paper might help :)

@YannicKilcher 4 жыл бұрын

After you've read a bunch both the structure, the methods and the ideas become repetitive over the entire field, that speeds up the reading process a lot. I guess I can do a video on that, but it will be pretty straightforward and obvious.

@vishwajitkumarvishnu3878 4 жыл бұрын

@@YannicKilcher it'll be helpful if you make a video. Thanks a lot

@eugening 4 жыл бұрын

Good discussion. The sound is a bit too soft.

@herp_derpingson 4 жыл бұрын

Reminds me of dropout for some reason. Except we are throwing away the dropped out neurons.

@JungleEd17 4 жыл бұрын

I watched it 2x but I think the connections are thrown out not the neurons. What's interesting here though: 1. The weights are what are important. 2. Pruning involves throwing out both weight AND structure. Why not keep the structure but choose new weights. Perhaps it just randomly started at a plattaeu of a local min or randomization ended up created redundancies. Jump the the weight really far a way and try again.

@fsxaircanada01 4 жыл бұрын

I think the motivation is that activations are not the biggest source of memory access and energy loss. If we can get rid of 90% of weights, then it could mean speed and energy improvements