Simulate the quantize and dequantize operations in training time.
The output of this module is given by
fake_quant_x = clamp(round(x / scale), quant_min, quant_max) * scale
scale defines the scale factor used for quantization.
zero_point specifies the quantized value to which 0 in floating point maps to
quant_min specifies the minimum allowable quantized value.
quant_max specifies the maximum allowable quantized value.
fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.
observer_enabled controls statistics collection on tensors
dtype specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype
Parameters:
observer (type) – Module for observing statistics on input tensors and calculating scale and zero-point.
channel_len (int) – Size of data at channel dim.
kwargs – Arguments for the observer module