Knowledge Distillation | Machine Learning

Рет қаралды 7,185

Күн бұрын

We all know that ensembles outperform individual models. However, the increase in number of models does mean inference (evaluation of new data) is more costly. This is where knowledge distillation comes to the rescue... do watch to find out how!

Пікірлер: 28

@CadenFtw 3 ай бұрын

Great video! Short and sweet

@TwinEdProductions 3 ай бұрын

Thanks!

@softerseltzer 2 жыл бұрын

Very clear and concise with proper introduction!

@TwinEdProductions 2 жыл бұрын

Thank you!

@prasundatta6590 Жыл бұрын

Precise and to the point. Thank you for this awesome video.

@TwinEdProductions Жыл бұрын

Thanks for your support :)

@vladimir_egay 2 жыл бұрын

Very clearly explained! Thank you!

@TwinEdProductions 2 жыл бұрын

Thanks!

@wolfisraging 2 жыл бұрын

Awesome explanation mate, waiting for more videos!!!

@TwinEdProductions 2 жыл бұрын

Cheers! Many videos to come :)

@victorsuciu3794 2 жыл бұрын

Thank you! I was recently reading about this topic but was having trouble understanding. Your explanation was fantastic. Is knowledge distillation really just replacing the ground truth "hard" labels in the dataset with the teacher's soft labels?

@TwinEdProductions 2 жыл бұрын

Ahh thanks, glad to know this was of help to someone! Well, knowledge distillation is about having a single model that works as well as an ensemble of models... the teacher forcing part is just one common way people achieve knowledge distillation... there are actually many methods out there in literature!

@sergeyzaitsev3319 Жыл бұрын

Great job, man!

@TwinEdProductions Жыл бұрын

Thanks :)

@MeshRoun 2 жыл бұрын

I am not sure to understand where the gain in training time is if the student has to learn from the teacher's predictions. Wouldn't it mean that we still have to train the large N x K model?

@TwinEdProductions 2 жыл бұрын

Hi great question. So yes, we still have to train an entire ensemble of models. The aim is not to save training time but inference time. So the aim is to have a single model at inference time in order to speed up inference for deployment.

@MeshRoun 2 жыл бұрын

@@TwinEdProductions gotcha thank you!

@aashishrana9356 Жыл бұрын

Crisp and Clear! why we have used cross entropy function here?

@TwinEdProductions Жыл бұрын

Hi! Thanks for your comment. Cross entropy function is generally used as it is theoretically motivated by minimising the KL divergence between the predicted and target distributions. I can link you to a video if you need more information

@ooooo8265 Жыл бұрын

@@TwinEdProductions I would absolutely love that !

@TwinEdProductions Жыл бұрын

@@ooooo8265 this is great explanation: kzfaq.info/get/bejne/ht2Xo89q0rHFoqc.html

@kavyagupta4345 Жыл бұрын

You said in a query that inference time reduces, training time remains same. How is that possible, could you explain?

@vyasraina3930 Жыл бұрын

Hi, thanks for the question. During training, we train an ensemble of models, so we are training many models. At inference time, we use only a single model (trained using the ensemble of models), so the query now passes through only a single model, as opposed to multiple models - i.e. faster inference! Hopefully this is useful.