ImperativeQuantAware¶
- class paddle.fluid.contrib.slim.quantization.imperative.qat. ImperativeQuantAware ( quantizable_layer_type=['Conv2D', 'Linear', 'Conv2DTranspose'], weight_quantize_type='abs_max', activation_quantize_type='moving_average_abs_max', weight_bits=8, activation_bits=8, moving_rate=0.9, weight_preprocess_layer=None, act_preprocess_layer=None, weight_quantize_layer=None, act_quantize_layer=None ) [source]
-
Applying quantization aware training (QAT) to the dgraph model.
-
quantize
(
model
)
quantize¶
-
According to weights’ and activations’ quantization types, the model will be added some fake quant ops, such as fake_quantize_dequantize_moving_average_abs_max, fake_quantize_dequantize_abs_max and so on. At the same time, the out_scale value of outputs would be calculated.
- Parameters
-
model (paddle.nn.Layer) – the model to be quantized.
- Returns
-
None
Examples: .. code-block:: python
import paddle from paddle.fluid.contrib.slim.quantization import ImperativeQuantAware
- class ImperativeModel(paddle.nn.Layer):
-
- def __init__(self):
-
super(ImperativeModel, self).__init__() # self.linear_0 would skip the quantization. self.linear_0 = paddle.nn.Linear(784, 400) self.linear_0.skip_quant = True
# self.linear_1 would not skip the quantization. self.linear_1 = paddle.nn.Linear(400, 10) self.linear_1.skip_quant = False
- def forward(self, inputs):
-
x = self.linear_0(inputs) x = self.linear_1(inputs) return x
model = ImperativeModel() imperative_qat = ImperativeQuantAware(
weight_quantize_type=’abs_max’, activation_quantize_type=’moving_average_abs_max’)
# Add the fake quant logical. # The original model will be rewrite. # # There is only one Layer(self.linear1) would be added the # fake quant logical. imperative_qat.quantize(model)
-
quantize
(
model
)