伪量化算子

class horizon_plugin_pytorch.quantization.FakeQuantize (observer: type, channel_len: int = 1, **kwargs)

Simulate the quantize and dequantize operations in training time.

The output of this module is given by

fake_quant_x = clamp(round(x / scale), quant_min, quant_max) * scale

scale defines the scale factor used for quantization.

zero_point specifies the quantized value to which 0 in floating point maps to

quant_min specifies the minimum allowable quantized value.

quant_max specifies the maximum allowable quantized value.

fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.

observer_enabled controls statistics collection on tensors

dtype specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype

Parameters:

observer (type) – Module for observing statistics on input tensors and calculating scale and zero-point.

channel_len (int) – Size of data at channel dim.

kwargs – Arguments for the observer module