Deep Dive on PyTorch Quantization - Chris Gottbrath

  Рет қаралды 21,795

PyTorch

PyTorch

Күн бұрын

Learn more: pytorch.org/docs/stable/quant...
It’s important to make efficient use of both server-side and on-device compute resources when developing machine learning applications. To support more efficient deployment on servers and edge devices, PyTorch added a support for model quantization using the familiar eager mode Python API.
Quantization leverages 8bit integer (int8) instructions to reduce the model size and run the inference faster (reduced latency) and can be the difference between a model achieving quality of service goals or even fitting into the resources available on a mobile device. Even when resources aren’t quite so constrained it may enable you to deploy a larger and more accurate model. Quantization is available in PyTorch starting in version 1.3 and with the release of PyTorch 1.4 we published quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2 in the PyTorch torchvision 0.5 library.

Пікірлер: 25
@leixun
@leixun 4 жыл бұрын
*My takeaways:* *0. Outline of this talk **0:51* *1. Motivation **1:42* - DNNs are very computationally intensive - Datacenter power consumption is doubling every year - Number of edge devices is growing fast, and lots of these devices are resource-constrained *2. Quantization basics **5:27* *3. PyTorch quantization **10:54* *3.1 Workflows **17:21* *3.2 Post training dynamic quantization **21:31* - Quantize weights at design time - Quantize activations (and choose their scaling factor) at runtime - No extra data are required - Suitable for LSTMs/transformers, and MLPs with small batch size - 2x faster computing, 4x less memory - Easy to do, use a 1-line API *3.3 Post training static quantization **23:57* - Quantize both weights and activations at design time - Extra data are needed for calibration (i.e. find scaling factor) - Suitable for CNNs - 1.5-2x faster computing, 4x less memory - Steps: 1. Modify model 25:55 2. Prepare and calibration 27:45 3. Convert 31:34 4. Deploy 32:59 *3.4 Quantization aware training **34:00* - Make the weights "more quantizable" through training and fine-tuning - Steps: 1. Modify model 36:43 2. Prepare and train 37:28 *3.5 Example models **39:26* *4. New in PyTorch 1.6* 4.1 Graph mode quantization 45:14 4.2 Numeric suite 48:17: tools to aid debugging accuracy drops due to quantization at layer-by-layer level *5. Framework support, CPU (x86, Arm) backends support **49:46* *6. Resources to know more **50:52*
@lorenzodemarinis2603
@lorenzodemarinis2603 3 жыл бұрын
This is gold, thank you!
@leixun
@leixun 3 жыл бұрын
@@lorenzodemarinis2603 You are welcome!
@harshr1831
@harshr1831 3 жыл бұрын
Thank you very much!
@leixun
@leixun 3 жыл бұрын
@@harshr1831 You are welcome!
@prudvi01
@prudvi01 3 жыл бұрын
MVP!
@aayushsingh9622
@aayushsingh9622 3 жыл бұрын
How to test the model after quantization? I am using post training static quant How to prepare the input to feed in this model
@ankitkumar-kg5ue
@ankitkumar-kg5ue 11 ай бұрын
what if want to fuse multiple conv and relu.
@rednas195
@rednas195 Ай бұрын
In the accuracy results, how come there is a difference in inference speed up between QAT and PTQ? Is this because of the different models used? because i would expect no differences in speed up if the same model was used
@parcfelixer
@parcfelixer 3 жыл бұрын
Awesome talk, thank you so much.
@MrGHJK1
@MrGHJK1 4 жыл бұрын
Awesome talk, thanks! Too much to ask, but it would be nice if Pytorch had a tool to convert quantized tensors parameters to TensorRT calibration tables
@user-bi3ox6kf4j
@user-bi3ox6kf4j 4 жыл бұрын
sorry, can you share the example code? Thank you
@raghuramank1
@raghuramank1 4 жыл бұрын
Please take a look at the pytorch tutorials page for example code: pytorch.org/tutorials/advanced/static_quantization_tutorial.html
@jetjodh
@jetjodh 4 жыл бұрын
Why not go lower than 8 bit int for quantization? Won't that be much more speedier?
@raghuramank1
@raghuramank1 4 жыл бұрын
Currently kernels on processors do not provide any speedup for lower bit precision
@user-bi3ox6kf4j
@user-bi3ox6kf4j 4 жыл бұрын
Tade off between accuracy and speed
@dsagman
@dsagman 5 ай бұрын
great info but please buy a pop filter.
@user-bi3ox6kf4j
@user-bi3ox6kf4j 4 жыл бұрын
And then I am the third
@jonathansum9084
@jonathansum9084 4 жыл бұрын
Then I am the second.
@ramanabotta6285
@ramanabotta6285 4 жыл бұрын
First view
@motelejesuolamilekan1950
@motelejesuolamilekan1950 4 жыл бұрын
Lol
@briancase6180
@briancase6180 2 жыл бұрын
OMG. We already have a term of the art for "zero point." It's called bias. We have a term, please use it. Otherwise, thanks for the great talk.
@ashermai2962
@ashermai2962 2 жыл бұрын
The reason it's called a zero_point is so that when pre-quantized weights bring the output to zero (for RELU activation), you want to add a zero_point so that quantized weights also bring the output to zero. Also the naming of scale and zero_point distinguishes themselves from the naming of each module's weights and bias, which are different concepts
tinyML Talks: A Practical Guide to Neural Network Quantization
1:01:20
The tinyML Foundation
Рет қаралды 24 М.
Official PyTorch Documentary: Powering the AI Revolution
35:53
MISS CIRCLE STUDENTS BULLY ME!
00:12
Andreas Eskander
Рет қаралды 18 МЛН
УГАДАЙ ГДЕ ПРАВИЛЬНЫЙ ЦВЕТ?😱
00:14
МЯТНАЯ ФАНТА
Рет қаралды 4,3 МЛН
Самый Молодой Актёр Без Оскара 😂
00:13
Глеб Рандалайнен
Рет қаралды 12 МЛН
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
19:46
Inside TensorFlow: Quantization aware training
30:35
TensorFlow
Рет қаралды 14 М.
NVAITC Webinar: Automatic Mixed Precision Training in PyTorch
19:18
NVIDIA Developer
Рет қаралды 6 М.
torchdynamo deep dive
1:35:59
Edward Z. Yang's PyTorch and PL
Рет қаралды 14 М.
What is PyTorch? (Machine/Deep Learning)
11:57
IBM Technology
Рет қаралды 25 М.
Leaner, Greener and Faster Pytorch Inference with Quantization
1:37:46
MLOps World: Machine Learning in Production
Рет қаралды 242
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 250 М.
The Lottery Ticket Hypothesis and pruning in PyTorch
38:07
mildlyoverfitted
Рет қаралды 8 М.
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 1,2 МЛН
Я купил первый в своей жизни VR! 🤯
1:00
Вэйми
Рет қаралды 3,3 МЛН
Это - iPhone 16 и вот что надо знать...
17:20
Overtake lab
Рет қаралды 137 М.
iPhone 15 Pro Max vs IPhone Xs Max  troll face speed test
0:33
Новые iPhone 16 и 16 Pro Max
0:42
Romancev768
Рет қаралды 2,1 МЛН
Опасность фирменной зарядки Apple
0:57
SuperCrastan
Рет қаралды 10 МЛН