常见故障

import 出错

错误一：Cannot find the extension library(_C.so)

解决方法：

确定 horizon_plugin_pytorch 版本和 cuda 版本是对应的。
在 python3 中，找到 horizon_plugin_pytorch 的执行路径，检测该目录下是否有 .so 文件。可能同时存在多个 horizon_plugin_pytorch 的版本，需要卸载只保留一个需要的版本。

错误二：RuntimeError: Cannot load custom ops. Please rebuild the horizon_plugin_pytorch

解决方法：确认本地 CUDA 环境是否正常，如路径、版本等。

量化精度异常

QAT/Quantized 精度不符合预期、出现 NAN 或 QAT 初始 loss 相对 float 明显异常。

解决方法：请参考精度调优工具使用指南。

Module 不支持 deepcopy

某些框架（例如：PyTorch Lightning）对 torch 原生 module 做了二次封装，但不支持 deepcopy。

解决方法：为模型实现 __deepcopy__ 方法。

class Model(Module):
    ...
    def __deepcopy__(self, memo):
        new_model = Model()
        new_model.xxx = self.xxx
        return new_model

Cannot find the extension library(_C.so)

解决方法：主要发生在 horizon_plugin_pytorch 安装成功但 import 失败，解决方案如下：

确定 horizon_plugin_pytorch 版本和 cuda 版本是对应的；
在 python3 中，找到 horizon-plugin-pytorch 的执行路径，检测该目录下是否有 .so 文件。可能同时存在多个 horizon-plugin-pytorch 的版本，需要卸载只保留一个需要的版本。

RuntimeError: Cannot load custom ops. Please rebuild the horizon_plugin_pytorch.

解决方法：请确认本地 CUDA 环境是否正常，如路径、版本是否符合预期。

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

解决方法：主要发生在无法正常 prepare 阶段，一般是由于模型中包含 non-leaf tensor 导致的，请将 prepare 的 inplace 配置为 True。

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL

解决方法：可能是多线程，python 程序没有完全杀干净导致的。

AttributeError: ‘NoneType’ object has no attribute ‘numel’

解决方法：该报错主要发生在插入伪量化节点阶段，是算子的输入 scale 为 None 导致。造成原因可能是输出层 conv 插入 Dequant 后又接了某个 op，存在类似于 conv+dequant+conv 的结构；或者是配置了高精度输出的 conv 后又接了其他算子导致。此时请检查 dequant 算子或高精度输出配置是否使用正确。

symbolically traced variables cannot be used as inputs to control flow

解决方法：该报错是在 fx 模式下使用了 if、循环等动态控制流导致。目前 fx 模式仅支持静态控制流，因此需要避免在 forward 中使用 if、for、assert 等动态语句。

NotImplementedError: function <method ‘neg’ of ‘torch._C._TensorBase’ objects> is not implemented for QTensor.

解决方法：该报错可能发生在 fx 模式下的 Calibration 阶段，是因为 fx 模式不支持 (-x) 形式的计算导致，请将 (-x) 修改为 (-1)*(x)。

NotimplementedError: function <function Tensor.rsub at 0x7f5a7cdiee50> is not implemented for QTensor.

解决方法：该报错可能发生在 fx 模式下的 Calibration 阶段，是因为 fx 模式下算子替换的逻辑是如果减法中的被减数是常量，就不自动进行算子替换，所以需要将减法修改为加法，例如将 (1-x) 修改为 (x+(-1))*(-1)。

页面目录

hb_compile工具

QAT

模型导出

Horizon算子

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

数据结构

功能接口

算子支持列表

算子BPU约束列表

#常见故障

#import 出错

#量化精度异常

#Module 不支持 deepcopy

#Cannot find the extension library(_C.so)

#RuntimeError: Cannot load custom ops. Please rebuild the horizon_plugin_pytorch.

#RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

#torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL

#AttributeError: ‘NoneType’ object has no attribute ‘numel’

#symbolically traced variables cannot be used as inputs to control flow

#NotImplementedError: function <method ‘neg’ of ‘torch._C._TensorBase’ objects> is not implemented for QTensor.

#NotimplementedError: function <function Tensor.rsub at 0x7f5a7cdiee50> is not implemented for QTensor.