Knowledge Distillation: A Good Teacher is Patient and Consistent

Lecture 10 - Knowledge Distillation | MIT 6.S965

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

The clown broke the wings of the white angel and gave the wings to Harley Quinn!#cosplay

Nutella bro sis family Challenge 😋

Children deceived dad #comedy

你们会选择哪一辆呢#short #angel #clown

Knowledge Distillation: A Good Teacher is Patient and Consistent

Рет қаралды 19,790

Connor Shorten

Connor Shorten

Күн бұрын

The optimal training recipe for knowledge distillation is consistency and patience. Consistency refers to showing the teacher and the student the exact same view of an image and additionally improving the support of the distribution with the MixUp augmentation. Patience refers to enduring long training schedules. Exciting to see advances in model compression to make stronger models more widely used!
Paper Links:
Knowledge Distillation: A Good Teacher is Patient and Consistent: arxiv.org/abs/2106.05237
Does Knowledge Distillation Really Work? arxiv.org/pdf/2106.05945.pdf
Meta Pseudo Labels: arxiv.org/pdf/2003.10580.pdf
MixUp Augmentation: keras.io/examples/vision/mixup/
Scaling Vision Transformers: arxiv.org/pdf/2106.04560.pdf
Well-Read Students Learn Better: arxiv.org/pdf/1908.08962.pdf
Chapters
0:00 Paper Title
0:05 Model Compression
1:11 Limitations of Pruning
2:13 Consistency in Distillation
4:08 Comparison with Meta Pseudo Labels
5:10 MixUp Augmentation
6:52 Patience in Distillation
8:53 Results
10:37 Exploring Knowledge Distillation
Thanks for watching! Please Subscribe!

Пікірлер: 6

@connorshorten6311

@connorshorten6311 3 жыл бұрын

This paper was quickly previewed in the AI Weekly Update series! Check it out here - kzfaq.info/get/bejne/j8WYp6ucmNnSiaM.html

@irfanrahadi7487

@irfanrahadi7487 2 жыл бұрын

great video !

@juanmanuelcirotorres6155

@juanmanuelcirotorres6155 3 жыл бұрын

You're the best one, I'm your bigger fan

@connorshorten6311

@connorshorten6311 3 жыл бұрын

Thank you, appreciate it!

@pawelkubik 2 жыл бұрын

Fixed teacher caching was never very practical anyway, because it greatly complicates the pipelines and teacher inference time is usually dwarfed by student's backpropagation. Backward pass is waaaay more expensive than the forward pass.

@patriciamachado6050

@patriciamachado6050 3 жыл бұрын

I recommend you research no sulfates fast hair growth shampoo.

Lecture 10 - Knowledge Distillation | MIT 6.S965

1:07:22

Lecture 10 - Knowledge Distillation | MIT 6.S965

MIT HAN Lab

Рет қаралды 13 М.

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

29:47

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Yannic Kilcher

Рет қаралды 70 М.

The clown broke the wings of the white angel and gave the wings to Harley Quinn!#cosplay

00:33

The clown broke the wings of the white angel and gave the wings to Harley Quinn!#cosplay

超人夫妇

Рет қаралды 23 МЛН

Nutella bro sis family Challenge 😋

00:31

Nutella bro sis family Challenge 😋

Mr. Clabik

Рет қаралды 9 МЛН

Children deceived dad #comedy

00:19

Children deceived dad #comedy

yuzvikii_family

Рет қаралды 8 МЛН

你们会选择哪一辆呢#short #angel #clown

00:20

你们会选择哪一辆呢#short #angel #clown

Super Beauty team

Рет қаралды 9 МЛН

DINO: Self-distillation with no labels

1:13:07

DINO: Self-distillation with no labels

Samuel Albanie

Рет қаралды 3,4 М.

This is why Deep Learning is really weird.

2:06:38

This is why Deep Learning is really weird.

Machine Learning Street Talk

Рет қаралды 362 М.

SimCLR Explained!

20:06

SimCLR Explained!

Connor Shorten

Рет қаралды 21 М.

Better not Bigger: Distilling LLMs into Specialized Models

16:49

Better not Bigger: Distilling LLMs into Specialized Models

Snorkel AI

Рет қаралды 2,1 М.

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

19:46

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Efficient NLP

Рет қаралды 14 М.

Gradients are Not All You Need (Machine Learning Research Paper Explained)

48:30

Gradients are Not All You Need (Machine Learning Research Paper Explained)

Yannic Kilcher

Рет қаралды 37 М.

The Attention Mechanism in Large Language Models

21:02

The Attention Mechanism in Large Language Models

Serrano.Academy

Рет қаралды 83 М.

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

20:12

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Yannic Kilcher

Рет қаралды 18 М.

EfficientML.ai Lecture 9 - Knowledge Distillation (MIT 6.5940, Fall 2023)

1:00:11

EfficientML.ai Lecture 9 - Knowledge Distillation (MIT 6.5940, Fall 2023)

MIT HAN Lab

Рет қаралды 4,9 М.

But what is a neural network? | Chapter 1, Deep learning

18:40

But what is a neural network? | Chapter 1, Deep learning

3Blue1Brown

Рет қаралды 16 МЛН

ссылка в описании #рек #smartphone #посылка #топ

0:17

ссылка в описании #рек #smartphone #посылка #топ

Apple Storree

Рет қаралды 5 МЛН

Самый дорогой кабель Apple

0:37

Самый дорогой кабель Apple

Romancev768

Рет қаралды 83 М.

he followed the finger movements #shortvideo #iphonefold #smartphone

0:14

he followed the finger movements #shortvideo #iphonefold #smartphone

Si pamerR

Рет қаралды 12 МЛН

MAC OS во всем лучше, чем Windows! #пк #игры #гейминг #сборкапк #игровойпк #apple #windows #macos

0:56

MAC OS во всем лучше, чем Windows! #пк #игры #гейминг #сборкапк #игровойпк #apple #windows #macos

MaxxPC

Рет қаралды 2 МЛН

Как слушать музыку с помощью чека?

0:36

Как слушать музыку с помощью чека?

SuperCrastan

Рет қаралды 442 М.

40$ or 50$ or Typecase iPad keyboard #ipadkeyboard #ipadcase #typecase #ipad #ipadpro

0:15

40$ or 50$ or Typecase iPad keyboard #ipadkeyboard #ipadcase #typecase #ipad #ipadpro

Typecase

Рет қаралды 6 МЛН

Trending Photo Crop and Background Remove color change in Photo Editing Tutorial

0:45

Trending Photo Crop and Background Remove color change in Photo Editing Tutorial

Tech With Sanwal

Рет қаралды 321 М.