Knowledge Distillation: A Good Teacher is Patient and Consistent

  Рет қаралды 19,790

Connor Shorten

Connor Shorten

Күн бұрын

The optimal training recipe for knowledge distillation is consistency and patience. Consistency refers to showing the teacher and the student the exact same view of an image and additionally improving the support of the distribution with the MixUp augmentation. Patience refers to enduring long training schedules. Exciting to see advances in model compression to make stronger models more widely used!
Paper Links:
Knowledge Distillation: A Good Teacher is Patient and Consistent: arxiv.org/abs/2106.05237
Does Knowledge Distillation Really Work? arxiv.org/pdf/2106.05945.pdf
Meta Pseudo Labels: arxiv.org/pdf/2003.10580.pdf
MixUp Augmentation: keras.io/examples/vision/mixup/
Scaling Vision Transformers: arxiv.org/pdf/2106.04560.pdf
Well-Read Students Learn Better: arxiv.org/pdf/1908.08962.pdf
Chapters
0:00 Paper Title
0:05 Model Compression
1:11 Limitations of Pruning
2:13 Consistency in Distillation
4:08 Comparison with Meta Pseudo Labels
5:10 MixUp Augmentation
6:52 Patience in Distillation
8:53 Results
10:37 Exploring Knowledge Distillation
Thanks for watching! Please Subscribe!

Пікірлер: 6
@connorshorten6311
@connorshorten6311 3 жыл бұрын
This paper was quickly previewed in the AI Weekly Update series! Check it out here - kzfaq.info/get/bejne/j8WYp6ucmNnSiaM.html
@irfanrahadi7487
@irfanrahadi7487 2 жыл бұрын
great video !
@juanmanuelcirotorres6155
@juanmanuelcirotorres6155 3 жыл бұрын
You're the best one, I'm your bigger fan
@connorshorten6311
@connorshorten6311 3 жыл бұрын
Thank you, appreciate it!
@pawelkubik
@pawelkubik 2 жыл бұрын
Fixed teacher caching was never very practical anyway, because it greatly complicates the pipelines and teacher inference time is usually dwarfed by student's backpropagation. Backward pass is waaaay more expensive than the forward pass.
@patriciamachado6050
@patriciamachado6050 3 жыл бұрын
I recommend you research no sulfates fast hair growth shampoo.
Lecture 10 - Knowledge Distillation | MIT 6.S965
1:07:22
MIT HAN Lab
Рет қаралды 13 М.
Nutella bro sis family Challenge 😋
00:31
Mr. Clabik
Рет қаралды 9 МЛН
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 8 МЛН
你们会选择哪一辆呢#short #angel #clown
00:20
Super Beauty team
Рет қаралды 9 МЛН
DINO: Self-distillation with no labels
1:13:07
Samuel Albanie
Рет қаралды 3,4 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 362 М.
SimCLR Explained!
20:06
Connor Shorten
Рет қаралды 21 М.
Better not Bigger: Distilling LLMs into Specialized Models
16:49
Snorkel AI
Рет қаралды 2,1 М.
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
19:46
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 83 М.
But what is a neural network? | Chapter 1, Deep learning
18:40
3Blue1Brown
Рет қаралды 16 МЛН
Самый дорогой кабель Apple
0:37
Romancev768
Рет қаралды 83 М.
Как слушать музыку с помощью чека?
0:36