Knowledge Distillation | Machine Learning

  Рет қаралды 7,185

TwinEd Productions

TwinEd Productions

Күн бұрын

We all know that ensembles outperform individual models. However, the increase in number of models does mean inference (evaluation of new data) is more costly. This is where knowledge distillation comes to the rescue... do watch to find out how!

Пікірлер: 28
@CadenFtw
@CadenFtw 3 ай бұрын
Great video! Short and sweet
@TwinEdProductions
@TwinEdProductions 3 ай бұрын
Thanks!
@softerseltzer
@softerseltzer 2 жыл бұрын
Very clear and concise with proper introduction!
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Thank you!
@prasundatta6590
@prasundatta6590 Жыл бұрын
Precise and to the point. Thank you for this awesome video.
@TwinEdProductions
@TwinEdProductions Жыл бұрын
Thanks for your support :)
@vladimir_egay
@vladimir_egay 2 жыл бұрын
Very clearly explained! Thank you!
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Thanks!
@wolfisraging
@wolfisraging 2 жыл бұрын
Awesome explanation mate, waiting for more videos!!!
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Cheers! Many videos to come :)
@victorsuciu3794
@victorsuciu3794 2 жыл бұрын
Thank you! I was recently reading about this topic but was having trouble understanding. Your explanation was fantastic. Is knowledge distillation really just replacing the ground truth "hard" labels in the dataset with the teacher's soft labels?
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Ahh thanks, glad to know this was of help to someone! Well, knowledge distillation is about having a single model that works as well as an ensemble of models... the teacher forcing part is just one common way people achieve knowledge distillation... there are actually many methods out there in literature!
@sergeyzaitsev3319
@sergeyzaitsev3319 Жыл бұрын
Great job, man!
@TwinEdProductions
@TwinEdProductions Жыл бұрын
Thanks :)
@MeshRoun
@MeshRoun 2 жыл бұрын
I am not sure to understand where the gain in training time is if the student has to learn from the teacher's predictions. Wouldn't it mean that we still have to train the large N x K model?
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Hi great question. So yes, we still have to train an entire ensemble of models. The aim is not to save training time but inference time. So the aim is to have a single model at inference time in order to speed up inference for deployment.
@MeshRoun
@MeshRoun 2 жыл бұрын
@@TwinEdProductions gotcha thank you!
@aashishrana9356
@aashishrana9356 Жыл бұрын
Crisp and Clear! why we have used cross entropy function here?
@TwinEdProductions
@TwinEdProductions Жыл бұрын
Hi! Thanks for your comment. Cross entropy function is generally used as it is theoretically motivated by minimising the KL divergence between the predicted and target distributions. I can link you to a video if you need more information
@ooooo8265
@ooooo8265 Жыл бұрын
​@@TwinEdProductions I would absolutely love that !
@TwinEdProductions
@TwinEdProductions Жыл бұрын
@@ooooo8265 this is great explanation: kzfaq.info/get/bejne/ht2Xo89q0rHFoqc.html
@kavyagupta4345
@kavyagupta4345 Жыл бұрын
You said in a query that inference time reduces, training time remains same. How is that possible, could you explain?
@vyasraina3930
@vyasraina3930 Жыл бұрын
Hi, thanks for the question. During training, we train an ensemble of models, so we are training many models. At inference time, we use only a single model (trained using the ensemble of models), so the query now passes through only a single model, as opposed to multiple models - i.e. faster inference! Hopefully this is useful.
@user-xd4cl3qd8x
@user-xd4cl3qd8x Жыл бұрын
Can you explain it a little more? How to select parameters for student architecture?
@devstuff2576
@devstuff2576 2 жыл бұрын
so the student optimizes 2 loss functions?
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
Hi! The student only optimizes 1 loss function i.e. where the hard labels have been replaced by the teacher's predictions.
@xiangyangfrank6286
@xiangyangfrank6286 2 жыл бұрын
Imaging, we have.a three way what????
@TwinEdProductions
@TwinEdProductions 2 жыл бұрын
"Classification task"
Uncertainty (Aleatoric vs Epistemic) | Machine Learning
10:18
TwinEd Productions
Рет қаралды 10 М.
Knowledge Distillation  Explained with Keras Example | #MLConcepts
24:00
Rithesh Sreenivasan
Рет қаралды 3,8 М.
Каха ограбил банк
01:00
К-Media
Рет қаралды 10 МЛН
My little bro is funny😁  @artur-boy
00:18
Andrey Grechka
Рет қаралды 11 МЛН
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
19:46
Knowledge Distillation with TAs
13:29
Connor Shorten
Рет қаралды 6 М.
A.I.  teaches itself to drive in Trackmania
15:04
Yosh
Рет қаралды 4,8 МЛН
Better not Bigger: Distilling LLMs into Specialized Models
16:49
He JUGGLED and SOLVED 3 Rubik's cubes! - Guinness World Records
3:25
Guinness World Records
Рет қаралды 34 МЛН
Knowledge Distillation in Deep Learning - Basics
9:51
Dingu Sagar
Рет қаралды 18 М.
What is Knowledge Distillation? explained with example
8:45
Data Science in your pocket
Рет қаралды 2,6 М.
Knowledge Distillation as Semiparametric Inference
50:40
Microsoft Research
Рет қаралды 1,8 М.
Teacher-Student Neural Networks: Knowledge Distillation in AI
13:01
Computing For All
Рет қаралды 2,6 М.