Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Рет қаралды 14,711

Күн бұрын

Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations
References:
LLM Inference Optimization blog post: lilianweng.github.io/posts/20...
How to deploy your deep learning project on a budget: luckytoilet.wordpress.com/202...
Efficient deep learning survey paper: arxiv.org/abs/2106.08962
SparseDNN: arxiv.org/abs/2101.07948

Пікірлер: 24

@thomasschmitt9669 3 ай бұрын

This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌

@bonob0123 21 күн бұрын

that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you

@kevon217 10 ай бұрын

Thanks for this!

@lucaskeller656 2 ай бұрын

Great format, succinctness, and diagrams. Thank you!

@unclecode 4 ай бұрын

Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.

@muhannadobeidat 3 ай бұрын

Excellent video. Well spoken. Nice visualizations.

@DurgaNagababuMolleti 10 күн бұрын

Superb

@heteromodal 5 ай бұрын

What a great video! Thank you!

@jokmenen_ 4 ай бұрын

Awesome video!

@user-qo7vr3ml4c Ай бұрын

Great summary, thank you.

@huiwencheng4585 5 ай бұрын

Fantastic introduction and explanation !

@vineetkumarmishra2989 3 ай бұрын

wonderfully explained !! Thanks for the video.

@jeremyuzan1169 2 ай бұрын

Great video

@420_gunna 5 ай бұрын

This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!

@user-bd7eq6vx1t Жыл бұрын

your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP

@kevon217 10 ай бұрын

@MuhammadAli-dw7mv 2 ай бұрын

nicely done

@hrsight 2 ай бұрын

nice video

@yunlu4657 5 ай бұрын

Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.

@EfficientNLP 5 ай бұрын

The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.