Distilling the Knowledge in a Neural Network

Рет қаралды 19,056

4 жыл бұрын

This is the first and foundational paper that started the research area of Knowledge Distillation.
Knowledge Distillation is a study of methods and techniques to extract the information from a cumbersome model (also called the Teacher model) and provide it to a simpler model (also called the Student model). Student models are the ones that are used for inference (especially on resource-constrained devices) and are supposed to excel at both accuracy and speed of prediction
Link to the paper:
arxiv.org/abs/1503.02531
Link to the summary of the paper:
towardsdatascience.com/paper-summary-distilling-the-knowledge-in-a-neural-network-dc8efd9813cc
#KnowledgeDistillation
#deeplearning
#softmax
#machinelearning

Пікірлер: 50

@Dannyboi91 3 жыл бұрын

This was a great explanation! The paper is fairly short and clear cut but the additional graph made the presentation way easier to understand!

@skauddy755 4 ай бұрын

Very insightful walkthrough. Thank You!

@KapilSachdeva 4 ай бұрын

🙏

@SP-db6sh 3 жыл бұрын

Best candid explanation of this cumbersome topic is distilled here !

@KapilSachdeva 3 жыл бұрын

Thanks 😀

@InquilineKea Жыл бұрын

THIS IS ACTUALLY SO GOOD

@KapilSachdeva Жыл бұрын

🙏

@tranquangkhai8329 3 жыл бұрын

Excellent explanation! Learn many things from your video. Thank you!

@syedkhureshi1879 3 жыл бұрын

Great work at explaining the concepts!

@leonwitt6830 3 жыл бұрын

Amazing Job Kapil! Please continue with your work :)

@kushalneo 6 ай бұрын

Great Explanation

@KapilSachdeva 6 ай бұрын

🙏

@shruti9457 2 жыл бұрын

Thank you for the very clear explanation!

@KapilSachdeva 2 жыл бұрын

🙏

@CarlosFajardoA 3 жыл бұрын

Great explanation. It helps me to undertand better. Thank you for sharing.

@furkatsultonov9976 3 жыл бұрын

excellent explanation! Thank you

@goldfishjy95 3 жыл бұрын

This is high quality education...thank you so much!

@KapilSachdeva 3 жыл бұрын

🙏

@aishaal-harbi1929 Жыл бұрын

Thank you so much, sir!

@KapilSachdeva Жыл бұрын

🙏

@mhozaifakhan 3 жыл бұрын

Great explanation. Thanks.

@mohammedkassahun3899 Жыл бұрын

Hows it that u only have 2k followers? I have no words. Even if this was the only video u made, u deserve a million likes.! Thanks a lot man!

@KapilSachdeva Жыл бұрын

🙏. Am happy it was of some help.

@anirudhthatipelli8765 Жыл бұрын

Thanks a lot, this was wonderfully explained!

@KapilSachdeva Жыл бұрын

🙏

@lidiyanorman8521 3 жыл бұрын

Thank you, great explanations!

@KapilSachdeva 3 жыл бұрын

🙏

@sqliu9489 3 жыл бұрын

Nice video👍

@kumarteerath6916 2 жыл бұрын

Impressive work! Keep it up.

@KapilSachdeva 2 жыл бұрын

🙏

@ghazalehserati1831 Жыл бұрын

Great and helpful explanations. thanks a lot.

@KapilSachdeva Жыл бұрын

🙏

@youzheng9546 Жыл бұрын

It's amazing ! Thank you for sharing the Knowledge Distillation so clearly !

@KapilSachdeva Жыл бұрын

🙏

@ogsconnect1312 2 жыл бұрын

Well done! Good job!

@KapilSachdeva 2 жыл бұрын

🙏

@anasnb2022 4 жыл бұрын

Thank you.. :)

@ishansharma4900 3 жыл бұрын

Thanks a lot!

@KapilSachdeva 3 жыл бұрын

🙏

@abdolahfrootan2127 2 жыл бұрын

You explained it amazingly. There is another new paper called "Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge" used this concept to suggest a model called FedGKT in the federated learning field. I suggest you to check it and maybe make a video to clarify their work. Thank you

@KapilSachdeva 2 жыл бұрын

🙏

@squarehead6c1 10 ай бұрын

Dear Mr. Sachdeva! Excellent presentation of this paper! A question. I am currently interested in knowledge engineering and the concept of explicit (e.g., logical sentences)and implciit (such as the one encoded in ANNs). Does the distillation extract explicit knowledge, for instance for the purpose of explainable AI or is it still just as encoded in the ANN?

@KapilSachdeva 10 ай бұрын

Very good question. Unfortunately, I do not have enough insight and or have come across experiments regarding the explicit vs implicit knowledge in relation with knowledge distillation. Would appreciate if comment here when you get some answer from your research. Thanks.

@psychicmario 3 жыл бұрын

Excellent explanation

@KapilSachdeva 3 жыл бұрын

🙏

@furqanmalik1425 2 жыл бұрын

Very good and adorable demonstration. Do you have any video explaining the paper Model Compression via Distillation and Quantization ICLR 2018 by Antonio Polino, Dan Alistarh and Razvan Pascanu ( Google Deep Mind)

@KapilSachdeva 2 жыл бұрын

🙏 Thanks for the kind words. At present, I do not have any video on my channel for this particular paper. Just finished reading it and indeed it is an interesting paper. Thanks for providing the reference.

@KapilSachdeva 2 жыл бұрын

Here is what I have understood by reading this paper - Background: End goal is to have a smaller, simpler, shallower network to be able to use on resource constrained devices and/or perform predictions faster. The 3 main ideas that exists to achieve this goal are - Transfer Learning, Quantization, Distillation Quantization => you reduce the weights size e.g. float to int or even binary => the math operations become very fast. Premise behind distillation: During training the network explore various directions to learn and often those are not required during prediction. I talk about this in my paper reading also. The Big Idea: Could we combine Quantization & Distillation? But how to combine them (Algorithm 1, Page 5): - Before you compute the distillation loss, you create the quantized weights. Note - do not update the weights or do not train it such that the weights are forced to be quantized. - Compute the gradients (i.e. the backward pass) - The gradients should be used to update the "original" weights. When the training ends or the last step in the training then you replace the weights with their quantized version. They have another version of the algorithm called Algorithm 2 (Page 6). This second version of algorithm is based on how one does the qunatization. For e.g. quantization is done by rounding number either up or down. To do this up or down you are adding a small number. But what should be that smaller number? ........ This is what this version of algorithm is doing by "learning" the appropriate smaller number to add. They call them quantization points. I am not sure if I have completely understood this aspect so make sure to verify it!. Results/Conclusions: They "strongly" suggests that distillation loss is better than normal loss. This is proven by doing experiments. Hope this helps!