QAT

class paddle.quantization. QAT ( config: paddle.quantization.config.QuantConfig ) [source]

Tools used to prepare model for quantization-aware training. :param config: Quantization configuration :type config: QuantConfig

Examples

>>> from paddle.quantization import QAT, QuantConfig
>>> from paddle.quantization.quanters import FakeQuanterWithAbsMaxObserver
>>> quanter = FakeQuanterWithAbsMaxObserver(moving_rate=0.9)
>>> q_config = QuantConfig(activation=quanter, weight=quanter)
>>> qat = QAT(q_config)
quantize ( model: paddle.nn.layer.layers.Layer, inplace=False ) [source]

quantize

Create a model for quantization-aware training.

The quantization configuration will be propagated in the model. And it will insert fake quanters into the model to simulate the quantization.

Parameters
  • model (Layer) – The model to be quantized.

  • inplace (bool) – Whether to modify the model in-place.

Return: The prepared model for quantization-aware training.

Examples

>>> from paddle.quantization import QAT, QuantConfig
>>> from paddle.quantization.quanters import FakeQuanterWithAbsMaxObserver
>>> from paddle.vision.models import LeNet

>>> quanter = FakeQuanterWithAbsMaxObserver(moving_rate=0.9)
>>> q_config = QuantConfig(activation=quanter, weight=quanter)
>>> qat = QAT(q_config)
>>> model = LeNet()
>>> quant_model = qat.quantize(model)
>>> print(quant_model)
LeNet(
  (features): Sequential(
    (0): QuantedConv2D(
      (weight_quanter): FakeQuanterWithAbsMaxObserverLayer()
      (activation_quanter): FakeQuanterWithAbsMaxObserverLayer()
    )
    (1): ObserveWrapper(
      (_observer): FakeQuanterWithAbsMaxObserverLayer()
      (_observed): ReLU()
    )
    (2): ObserveWrapper(
      (_observer): FakeQuanterWithAbsMaxObserverLayer()
      (_observed): MaxPool2D(kernel_size=2, stride=2, padding=0)
    )
    (3): QuantedConv2D(
      (weight_quanter): FakeQuanterWithAbsMaxObserverLayer()
      (activation_quanter): FakeQuanterWithAbsMaxObserverLayer()
    )
    (4): ObserveWrapper(
      (_observer): FakeQuanterWithAbsMaxObserverLayer()
      (_observed): ReLU()
    )
    (5): ObserveWrapper(
      (_observer): FakeQuanterWithAbsMaxObserverLayer()
      (_observed): MaxPool2D(kernel_size=2, stride=2, padding=0)
    )
  )
  (fc): Sequential(
    (0): QuantedLinear(
      (weight_quanter): FakeQuanterWithAbsMaxObserverLayer()
      (activation_quanter): FakeQuanterWithAbsMaxObserverLayer()
    )
    (1): QuantedLinear(
      (weight_quanter): FakeQuanterWithAbsMaxObserverLayer()
      (activation_quanter): FakeQuanterWithAbsMaxObserverLayer()
    )
    (2): QuantedLinear(
      (weight_quanter): FakeQuanterWithAbsMaxObserverLayer()
      (activation_quanter): FakeQuanterWithAbsMaxObserverLayer()
    )
  )
)
convert ( model: paddle.nn.layer.layers.Layer, inplace=False, remain_weight=False )

convert

Convert the quantization model to ONNX style. And the converted model can be saved as inference model by calling paddle.jit.save. :param model: The quantized model to be converted. :type model: Layer :param inplace: Whether to modify the model in-place, default is False. :type inplace: bool, optional :param remain_weight: Whether to remain weights in floats, default is False. :type remain_weight: bool, optional

Return: The converted model

Examples

>>> import paddle
>>> from paddle.quantization import QAT, QuantConfig
>>> from paddle.quantization.quanters import FakeQuanterWithAbsMaxObserver
>>> from paddle.vision.models import LeNet

>>> quanter = FakeQuanterWithAbsMaxObserver(moving_rate=0.9)
>>> q_config = QuantConfig(activation=quanter, weight=quanter)
>>> qat = QAT(q_config)
>>> model = LeNet()
>>> quantized_model = qat.quantize(model)
>>> converted_model = qat.convert(quantized_model)
>>> dummy_data = paddle.rand([1, 1, 32, 32], dtype="float32")
>>> paddle.jit.save(converted_model, "./quant_deploy", [dummy_data])