伪量化算子

class horizon_plugin_pytorch.quantization.FakeQuantize (observer: type, channel_len: int = 1, **kwargs)

Simulate the quantize and dequantize operations in training time.

The output of this module is given by

fake_quant_x = clamp(round(x / scale), quant_min, quant_max) * scale

scale defines the scale factor used for quantization.

zero_point specifies the quantized value to which 0 in floating point maps to

quant_min specifies the minimum allowable quantized value.

quant_max specifies the maximum allowable quantized value.

fake_quant_enabled controls the application of fake quantization on tensors, note that statistics can still be updated.

observer_enabled controls statistics collection on tensors

dtype specifies the quantized dtype that is being emulated with fake-quantization, the allowable values is qint8 and qint16. The values of quant_min and quant_max should be chosen to be consistent with the dtype

Parameters:

observer (type) – Module for observing statistics on input tensors and calculating scale and zero-point.

channel_len (int) – Size of data at channel dim.

kwargs – Arguments for the observer module

PTQ转换工具

hb_compile工具

PTQ转换步骤

PTQ转换示例

常见问题及故障处理

附录

开发指南

深入探索

API参考

QAT

模型导出

Horizon算子

常见问题及常见故障

模型推理开发

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

UCP通用API介绍

数据结构

功能接口

UCP性能分析工具

常见问题及错误码

模型部署原理及流程

模型部署实践指导实例

HMCT API Reference

工具链算子支持约束列表

算子支持列表

算子BPU约束列表

社区优质文章

伪量化算子

hb_compile工具

QAT

模型导出

Horizon算子

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

数据结构

功能接口

算子支持列表

算子BPU约束列表

#伪量化算子

伪量化算子