English

Accuracy Tuning Tool Guide

Due to the error in the floating point to fixed point process, you will inevitably encounter the problem of quantization model accuracy dropout when using quantization training tools. Typically, there are several reasons for this:

The original floating-point model is not favorable for quantization, such as the existence of shared ops or shared structures.
The QAT network structure or configuration is abnormal, such as there is a pattern without fuse in the model, and the output is not set with high accuracy.
Some operators are sensitive to quantization, and the quantization error of the operator accumulates layer by layer in the forward propagation process, which ultimately leads to a large error in the model output.

For the above cases, we provides the accuracy tuning tool to help you quickly locate and solve the accuracy problems, which mainly includes the following:

Model Structure Checking: check if there are shared ops, patterns without fuse, or quantization configurations in the model that do not meet expectations.
Use the QuantAnalysis Class to compare and analyze the two models automatically: by comparing, locate the abnormal operators or quantization-sensitive ops in the quantized model.
Use the ModelProfiler Class and the HbirModelProfiler Class to get information about the numerical characteristics of each op in the model: the obtained information includes the maximum and minimum values of inputs and outputs, etc. The functionality of these two classes is identical, the difference is that HbirModelProfiler accepts as input only the qat hbir model. Usually you don't need to call this module manually, you can get the numerical information of both models directly from QuantAnalysis.run.

Quickstart

When encountering quantization model accuracy dropout problems, we recommend using the accuracy tuning tool according to the following process.

Check if there are any unfavorable structures or abnormal configurations in the model.
Use QuantAnalysis module to analyze the model as follows:

1). Find the input (i.e., the bad case) that exhibits the maximum discrepancy between the outputs of a baseline model and the model under analysis, and use this input as the input for the model.

2). Perform quantization sensitivity analysis, the current experience is that the first n L1 sensitivities are usually the quantitative sensitivity ops (the value of n varies from model to model, so there is no automatic method to determine it, so we need to try it manually, e.g., the first 10, 20...). Set the quantization sensitive op to high accuracy quantization (e.g., int16 quantization), and redo the quantization process.

3). Or compare the inputs and outputs of the two models layer by layer to check whether there is an abnormal quantization op such as the data range is too large or the scale is unreasonable, e.g., certain ops with physical meanings should be set to a fixed scale.

The overall flow chart is as follows:

A whole example is as follows:

from horizon_plugin_profiler import QuantAnalysis

import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub

from horizon_plugin_pytorch import set_march, qint8
from horizon_plugin_pytorch.quantization import FakeQuantState
from horizon_plugin_pytorch.quantization import hbdk4 as hb4
from horizon_plugin_pytorch.quantization import (
    prepare,
    set_fake_quantize,
    get_qconfig,
)
from horizon_plugin_pytorch.quantization.qconfig_setter import *


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv = nn.Conv2d(3, 3, 1)
        self.relu = nn.ReLU()
        self.quant = QuantStub()
        self.dequant = DeQuantStub()
    
    def forward(self, x):
        x = self.quant(x)
        x = self.conv(x)
        x = self.relu(x)
        x = torch.nn.functional.interpolate(
            x, scale_factor=1.3, mode="bilinear", align_corners=False
        )
        x = self.dequant(x)
        return x


data = torch.rand((1, 3, 32, 32))
float_net = Net()
float_net(data)

set_march("nash-e")


############################### Model Structure Checking ##############################
# verify that the prompted exception layer is as expected
calib_net = prepare(
    float_net,
    data,
    qconfig_setter=QconfigSetter(
        get_qconfig(),
        [
            ModuleNameTemplate({"": qint8}),
            ConvDtypeTemplate(),
            MatmulDtypeTemplate(),
        ],
    ),
)
##########################################################################

# do calibration
set_fake_quantize(calib_net.eval(), FakeQuantState.CALIBRATION)
calib_net(data)

float_net.eval()
# Important!!! Set the calibration model into validation state
set_fake_quantize(calib_net.eval(), FakeQuantState.VALIDATION)

# export hbir model
qat_hbir = hb4.export(calib_net, (data,))

############################### quant analysis ############################

# 1. initialization
qa = QuantAnalysis(
    baseline_model=float_net,
    analysis_model=calib_net,
    analysis_model_type="fake_quant",
    device_ids=0,  # GPU index. Analysis will run on cpu if `device_ids` not set.
    out_dir="./floatvsqat",
)

# compare qat and qat hbir is also supported
# qa = QuantAnalysis(
#     baseline_model=calib_net,
#     analysis_model=qat_hbir,
#     analysis_model_type="export",
#     device_ids=0,  # GPU index. Analysis will run on cpu if `device_ids` not set.
#     out_dir="./qatvshbir",
# )

# 2. If there is a known badcase, the badcase input can be set directly through this interface.
qa.set_bad_case(data)

# in practice, it is recommended to use auto_find_bad_case to search for bad cases across the dataloader
# setting the num_steps parameter to control the search range is also supported
# qa.auto_find_bad_case(your_dataloader, num_steps=100)

# 3. run two model
qa.run()

# 4. compare the two model layer-by-layer. Verify that the abnormal layer indicated by abnormal_layer_advisor.txt is as expected
qa.compare_per_layer()

# 5. calculate sensitivity nodes. You can set the topk sorted sensitivity nodes to high accuracy to try to improve the quantization model accuracy
qa.sensitivity()

Analyzing Statistical Information

After the run() interface of QuantAnalysis is called, both of baseline_model and analysis_model will perform inference on the badcase, and the input / output of all layers will be saved in the op_infos folder of output path.

import torch
import matplotlib.pyplot as plt

# Load opinfo
opinfo = torch.load("xxx.opinfo")

input = opinfo.input
output = opinfo.output

# Plot value distribution
plt.hist(output[0].as_subclass(torch.Tensor).flatten().cpu().numpy(), bins=30, color='blue')
plt.title("Output Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.grid(True)
plt.savefig("output.png")

By loading the opinfo file using torch.load, you can access the input and output of an operator. Then, with matplotlib, you can visualize the corresponding value distribution.

When analyzing the error of a single operator, you can build a model which only contains one operator. Feed the inputs from opinfo into the model and compare the difference between the model output and the opinfo output by visualizing with scatter plots or histograms.

Multi-machine Multi-GPU

QuantAnalysis tool now supports multiple machines and multiple GPUs. If you already have a training environment built with torch.nn.parallel.DistributedDataParallel, you can simply replace the train function with the quant_analysis function in the example.

import os
import sys
import time
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.optim as optim
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import Dataset, DataLoader
from torch.utils.data.distributed import DistributedSampler

from horizon_plugin_pytorch.quantization import (
    QuantStub,
    prepare,
    QconfigSetter,
    set_fake_quantize,
    FakeQuantState,
    get_qconfig,
    observer_v2,
)
from horizon_plugin_pytorch.quantization.qconfig_setter import ModuleNameTemplate
from horizon_plugin_pytorch.march import set_march
from horizon_plugin_pytorch.dtype import qint8
from torch.ao.quantization import DeQuantStub
from horizon_plugin_profiler import QuantAnalysis
from horizon_plugin_profiler.utils.entities import Metric

# Example dataset. Please use the evaluation dataset when debugging
class ExampleDataset(Dataset):
    def __init__(self, size=10000, input_size=32):
        self.size = size
        self.data = torch.randn(size, 3, input_size, input_size)

    def __len__(self):
        return self.size

    def __getitem__(self, idx):
        return self.data[idx]


# Model definition
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.fc = nn.Linear(8 * 8 * 32, num_classes)
        self.quant = QuantStub()
        self.dequant = DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        x = self.fc(x)
        x = self.dequant(x)
        return x


def setup(rank, world_size, master_addr, master_port):
    """Initialize the distributed environment"""
    os.environ["MASTER_ADDR"] = master_addr
    os.environ["MASTER_PORT"] = str(master_port)

    # Initialize process group
    dist.init_process_group(backend="nccl", rank=rank, world_size=world_size)
    torch.cuda.set_device(rank)  # set current GPU


def cleanup():
    """Cleaning up the distributed training environment"""
    dist.destroy_process_group()


def quant_analysis(rank, world_size, config):
    setup(rank, world_size, config["master_addr"], config["master_port"])

    # Set the target hardware platform code, pay attention to the march distinction here nash-e/m/p
    set_march("nash-e")
    float_net = ConvNet()

    # prepare, must use DistributedSampler
    dataset = ExampleDataset(size=10000)
    sampler = DistributedSampler(
        dataset,
        num_replicas=world_size,
        rank=rank,
        shuffle=False,
    )
    dataloader = DataLoader(
        dataset,
        batch_size=1,  # It is recommended to set batch = 1 when debugging
        sampler=sampler,
        num_workers=config["num_workers"],
        pin_memory=True,
    )

    # Load the trained float model checkpoint. Note that the parameter here needs to be replaced with your actual model checkpoint path.
    float_ckpt = torch.load("/your_float_ckpt_path")
    float_net.load_state_dict(float_ckpt)

    # Convert floating point model to calib model
    example_inputs = torch.rand((1, 3, 32, 32))
    calib_net = prepare(
        float_net,
        example_inputs,
        QconfigSetter(
            reference_qconfig=get_qconfig(observer=observer_v2.MSEObserver),
            templates=[
                ModuleNameTemplate({"": qint8}),
            ],
        ),
    )

    # Load the calibration checkpoint. Note that the parameter here needs to be replaced with your actual model checkpoint path.
    calib_ckpt = torch.load("/your_calib_ckpt_path")
    calib_net.load_state_dict(calib_ckpt)

    # Important!!! Set the calibration model into validation state
    float_net.eval()
    set_fake_quantize(calib_net.eval(), FakeQuantState.VALIDATION)

    # debug
    qa = QuantAnalysis(
        baseline_model=float_net,
        analysis_model=calib_net,
        analysis_model_type="fake_quant",
        device_ids=rank,  # Set device_ids to the current rank
        out_dir=config["output_dir"],
    )
    qa.auto_find_bad_case(
        data_generator=dataloader,
        num_steps=100,
        metric=Metric.ATOL,
    )
    qa.run()
    qa.compare_per_layer()
    qa.sensitivity(metric=Metric.ATOL)

    # Cleaning up the distributed training environment
    cleanup()


def main():
    import argparse

    parser = argparse.ArgumentParser(description="Multi-Node QuantAnalysis")
    parser.add_argument(
        "--nnodes", type=int, default=1, help="number of nodes in the cluster"
    )
    parser.add_argument(
        "--nproc_per_node",
        type=int,
        default=torch.cuda.device_count(),
        help="number of GPUs per node",
    )
    parser.add_argument(
        "--node_rank",
        type=int,
        default=0,
        help="rank of this node in the cluster (0-indexed)",
    )
    parser.add_argument(
        "--master_addr",
        type=str,
        default="localhost",
        help="IP address of the master node (node with rank 0)",
    )
    parser.add_argument(
        "--master_port",
        type=int,
        default=29500,
        help="port on master node for communication",
    )
    parser.add_argument(
        "--num_workers",
        type=int,
        default=4,
        help="number of data loader workers per GPU",
    )
    parser.add_argument(
        "--output_dir",
        type=str,
        default="./quant_analysis_output",
        help="directory to save debug outputs",
    )

    args = parser.parse_args()

    os.makedirs(args.output_dir, exist_ok=True)

    config = {
        "master_addr": args.master_addr,
        "master_port": args.master_port,
        "num_workers": args.num_workers,
        "output_dir": args.output_dir,
    }

    # Calculating the global world size
    world_size = args.nnodes * args.nproc_per_node

    # Print configuration information
    if args.node_rank == 0:
        print(f"Starting multi-node QuantAnalysis with:")
        print(f"  Nodes: {args.nnodes}")
        print(f"  GPUs per node: {args.nproc_per_node}")
        print(f"  Total GPUs: {world_size}")
        print(f"  Master address: {args.master_addr}:{args.master_port}")

    # Start distributed debugging
    mp.spawn(
        quant_analysis, args=(world_size, config), nprocs=args.nproc_per_node, join=True
    )


if __name__ == "__main__":
    main()

API Reference

Model Structure Checking

# from horizon_plugin_pytorch.utils.check_model import check_qat_model

def check_qat_model(
    model: torch.nn.Module,
    example_inputs: Any,
    save_results: bool = False,
    out_dir: Optional[str] = None,
):

Check if there are structures in the calibration/qat model that are not favorable for quantization and if the quantization qconfig configuration is as expected.

Parameters

model: model to be checked.
example_inputs: model inputs.
save_results: whether to save the check results to a txt file. Default is False.
out_dir: save path of the result file 'model_check_result.txt'. Default is empty, save to .model_check_results/.

Output

screen output: abnormal layers that check out.
model_check_result.txt: generated when save_results = True. It consists of 5 main parts:

1). Unfused pattern.

2). The calling times of each module. Normally each op is called only 1 time, 0 means it is not called, more than 1 time means it is shared many times. If it is not called or shared multiple times, an exception prompt will be displayed.

3). The qconfig configuration for each op output.

4). The qconfig configuration for each op weight (if any).

5). Exception qconfig hints (if any).

Fusable modules are listed below:
name    type
------  -----------------------------------------------------
conv    <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>
relu    <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>

All modules in the model run exactly once.

Each layer out qconfig:
+---------------+-----------------------------------------------------------+---------------+---------------+----------------+-----------------------------+
| Module Name   | Module Type                                               | Input dtype   | out dtype     | ch_axis        | observer                    |
|---------------+-----------------------------------------------------------+---------------+---------------+----------------+-----------------------------|
| quant         | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>   | torch.float32 | qint8         | -1             | MovingAverageMinMaxObserver |
| conv          | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>     | qint8         | qint8         | -1             | MovingAverageMinMaxObserver |
| relu          | <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>         | qint8         | qint8         | qconfig = None |                             |
| dequant       | <class 'horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub'> | qint8         | torch.float32 | qconfig = None |                             |
+---------------+-----------------------------------------------------------+---------------+---------------+----------------+-----------------------------+

Weight qconfig:
+---------------+-------------------------------------------------------+----------------+-----------+---------------------------------------+
| Module Name   | Module Type                                           | weight dtype   |   ch_axis | observer                              |
|---------------+-------------------------------------------------------+----------------+-----------+---------------------------------------|
| conv          | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'> | qint8          |         0 | MovingAveragePerChannelMinMaxObserver |
+---------------+-------------------------------------------------------+----------------+-----------+---------------------------------------+

Note

The prepareinterface has integrated this check. Please pay attention to the inspection results output by this interface and make targeted adjustments to the model based on the inspection results.

QuantAnalysis Class

QuantAnalysis class can automatically find the bad case with the largest output of two models, and use it as input to compare the output of two models layer by layer. In addition, QuantAnalysis class also provides the function of calculating the sensitivity, you can try to set the node with the topk sensitivity ranking with high accuracy, such as int16 quantization, to improve the accuracy of quanitized model.

class QuantAnalysis(object):
    def __init__(
        self,
        baseline_model: Union[torch.nn.Module, HbirModule],
        analysis_model: Union[torch.nn.Module, HbirModule],
        analysis_model_type: str,
        device_ids: Union[List[int], int] = None,
        post_process: Optional[Callable] = None,
        out_dir: Optional[str] = None,
    )

Parameters

baseline_model: baseline model (high accuracy).
analysis_model: model to be analyzed ( accuracy dropping points).
analysis_model_type: model to be analyzed. Support input:
- fake_quant: the model to be analyzed can be a calibration model with dropped accuracy, in which case the baseline model can be either the original floating-point model or a accuracy-compliant calibration model in a mixed int8/int16 configuration.
- pre_export: the model to be analyzed is a pseudo-quantized model with lookup table (LUT) for fixed-point conversion, and the benchmark model is the corresponding original pseudo-quantized model.
- export: the model to be analyzed is a pseudo-quantized HBIR model, and the benchmark model is the corresponding pseudo-quantized model with LUT for fixed-point conversion.
- convert: the model to be analyzed is a fixed-point HBIR model, and the benchmark model is the corresponding pseudo-quantized model with LUT for fixed-point conversion.
device_ids: GPU device ids to run analysis. Default None.
post_process: post process function which performs on model output.
out_dir: specify the output directory for the comparison results.

Attention

Because QAT training changes the model weight distribution, we generally do not recommend comparing floating point or calibration models with qat models.

The methods in this class are as follows.

auto_find_bad_case

def auto_find_bad_case(
    self,
    data_generator: Iterable,
    num_steps: int = sys.maxsize,
    metric: Union[str, _Metric] = Metric.ATOL,
    device: Optional[Union[torch.device, str, int, List[int]]] = None,
    custom_metric_func: Optional[Callable] = None,
    custom_metric_order_seq: Optional[str] = None,
    cached_attrs: Optional[Tuple[str, ...]] = None,
    dump_in_run: bool = False,
):

Automatically find the badcase that causes the worst output for the two models.

Parameters

data_generator: dataloader or a custom iterator that produces one piece of data per iteration.
num_steps: number of iteration steps.
metric: specify which metric to use as the metric for the badcase. default is to use the worst result of ATOL. Support COSINE/L1/ATOL.
device: deprecated. Specify the GPU through device_ids during initialization of QuantAnalysis.
custom_metric_func: deprecated. Custom metrics are no longer supported.
custom_metric_order_seq: deprecated. Custom metrics are no longer supported.
cached_attrs: cached attrs to use as input. Usually used in sequence model. For instance, some results of the first frame must be treated as input when running the second frame, Default None.
dump_in_run: whether dump badcase during the running process.

Note

Function auto_find_bad_casegoes through data_generator, runs baseline model and analysis model, computes each output results on COSINE/L1/ATOL metrics and finds the baddest input case on each metric.

Output

badcase.txt: It has three parts.
- The baddest input index in data_generator of each output under different metrics.
- The baddest metric value computed by the baddest input of each output.
- The baddest input index of each metric.
  
  The bad case input index of each output: Name/Index COSINE L1 ATOL ------------ -------- ---- ------ box0 16 0 0 box1 77 32 32 box2 66 46 56 scores 61 96 100 centerness 0 0 0 yawness 17 76 18 The metric results of each badcase: Name/Index COSINE L1 ATOL ------------ --------- --------- --------- box0 0.906329 84.7319 2.97311 box1 0.968623 38.602 2.98777 box2 0.799405 237.053 4.37895 scores 0.388762 0.0080612 0.0395061 centerness 0.904206 0.0456813 0.178009 yawness -0.325684 4.87353 1.27329 The bad case input index of the worst output: metric dataloader index -------- ------------------ COSINE 17 L1 46 ATOL 56
- badcase.pt: The baddest input data under the metric pass by parameter metric. It is used as default input of function run.

set_bad_case

def set_bad_case(
    self,
    data: Any,
    baseline_model_cached_attr: Optional[Dict] = None,
    analysis_model_cached_attr: Optional[Dict] = None,
):

Set the badcase manually.

Attention

Usually, we suggest that you find badcase by function auto_find_bad_case. If the manual set badcase is not the actual badcase, it is difficult for quant analysis tool to find quantization sensitive layers.

Parameters

data: badcase input.
baseline_model_cached_attr: baseline model cached attr.
analysis_model_cached_attr: analysis model cached attr.

load_bad_case

def load_bad_case(self, filename: Optional[str] = None)

Load badcase from the specified file.

Parameters

filename: specified file path. Defaultly, it loads all badcase related files saved by function auto_find_bad_case from directory specified by out_dir.

save_bad_case

def save_bad_case(self)

Save badcase to the {self.out_dir}/badcase.pt file.

Attention

It is used with set_bad_case. Usually, you do not need to invoke this function.

set_model_profiler_dir

def set_model_profiler_dir(
    self,
    baseline_model_profiler_path: str,
    analysis_model_profiler_path: str,
):

Specify the path to save the output of model_profiler manually.

In some cases, the ModelProfiler is defined and run before QuantAnalysis is initialized, in which case you can directly specify the path to the existing ModelProfiler, skipping the run step of QuantAnalysis and comparing the output of the two models directly.

Parameters

baseline_model_profiler_path: profiler path for the baseline model.
analysis_model_profiler_path: profiler path for the model to be analyzed.

run

def run(
    self,
    device: Optional[Union[torch.device, str, int]] = None,
    index: Optional[int] = None,
)

Run the two models and save the results for each layer in the model separately.

Parameters

device: deprecated. Specify the GPU through device_ids during initialization of QuantAnalysis.
index: use which index input as example input.

Attention

Only index found by auto_find_bad_case and shown in badcase.txtis allowed to be parameter.

compare_per_layer

def compare_per_layer(
    self,
    prefixes: Tuple[str, ...] = None,
    types: Tuple[Type, ...] = None,
):

Compare the results of each layer in the two models.

Parameters

prefixes: the prefix of ops.
types: the types of ops.

Note

Usually you do not need to specify the prefixes and typesparameters. If you want to skip the comparison of certain ops with less quantitative impact based on some prior experience, or want to save time, you can use these two parameters to specify the comparison of certain ops or a certain type of ops.

Output

abnormal_layer_advisor.txt: all anomaly layers, including cases of data range too large/outputs with inf or NaN/outputs without high accuracy.
compare_per_layer_out.txt: show the specific information of each layer in the model in the form of a table, including various metrics, data ranges, quantized dtype, etc.. Each column from left to right represents:
- Index: op index.
- mod_name: name of the op, if the op is of module type, the prefix name of the module in the model will be shown, if it is of function type, it will not be shown.
- base_op_type: type of the op in the base model, may be module type or function name.
- analy_op_type: type of the op in the model to be analyzed, could be module type or function name.
- Shape: shape of the op.
- quant_dtype: quantized type output of the op.
- Qscale: quantized scale output of the op.
- Cosine: cosine similarity of the op output in the two models.
- L1: L1 distance of the op output in the two models.
- Atol: absolute error of the op output in the two models.
- max_qscale_diff: the max N scale diff of the op output in the two models.
- base_model_min: minimum value of the op output in the baseline model.
- analy_model_min: minimum value of the op output in the model to be analyzed.
- base_model_max: maximum value of the op output in the baseline model.
- analy_model_max: maximum value of the op output in the model to be analyzed.
- base_model_mean: average of the op output in the baseline model.
- analy_model_mean: average of the op output in the model to be analyzed.
  
  +----+------------+--------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+---------------+-----------+-----------+-----------+-----------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+--------------------+ | | mod_name | base_op_type | analy_op_type | shape | quant_dtype | qscale | Cosine | L1 | Atol | max_qscale_diff | base_model_min | analy_model_min | base_model_max | analy_model_max | base_model_mean | analy_model_mean | |----+------------+--------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+---------------+-----------+-----------+-----------+-----------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+--------------------| | 0 | quant | torch.ao.quantization.stubs.QuantStub | horizon_plugin_pytorch.nn.qat.stubs.QuantStub | torch.Size([1, 3, 32, 32]) | qint8 | 0.0078404 | 0.9999922 | 0.0019772 | 0.0039202 | 0.5000016 | 0.0002798 | 0.0000000 | 0.9996471 | 0.9957269 | 0.4986397 | 0.4986700 | | 1 | conv | torch.nn.modules.conv.Conv2d | horizon_plugin_pytorch.nn.qat.conv2d.Conv2d | torch.Size([1, 3, 32, 32]) | qint8 | 0.0056401 | 0.9999791 | 0.0020876 | 0.0092935 | 1.6477517 | -0.7193903 | -0.7162931 | 0.5436335 | 0.5414499 | -0.0423445 | -0.0413149 | | 2 | relu | torch.nn.modules.activation.ReLU | torch.nn.modules.activation.ReLU | torch.Size([1, 3, 32, 32]) | qint8 | 0.0056401 | 0.9999741 | 0.0009586 | 0.0088447 | 1.5681799 | 0.0000000 | 0.0000000 | 0.5436335 | 0.5414499 | 0.1555644 | 0.1557564 | | 3 | | horizon_plugin_pytorch.nn.interpolate.autocasted_interpolate_outer | horizon_plugin_pytorch.nn.interpolate.autocasted_interpolate_outer | torch.Size([1, 3, 41, 41]) | qint8 | 0.0056401 | 0.9924216 | 0.0160291 | 0.2094204 | 37.1305954 | 0.0000000 | 0.0000000 | 0.5149657 | 0.5301697 | 0.1550578 | 0.1559310 | | 4 | dequant | torch.ao.quantization.stubs.DeQuantStub | horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub | torch.Size([1, 3, 41, 41]) | torch.float32 | | 0.9924216 | 0.0160291 | 0.2094204 | | 0.0000000 | 0.0000000 | 0.5149657 | 0.5301697 | 0.1550578 | 0.1559310 | +----+------------+--------------------------------------------------------------------+--------------------------------------------------------------------+----------------------------+---------------+-----------+-----------+-----------+-----------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+--------------------+
compare_per_layer_out.csv: show the specific information of each layer in csv format. The content is exactly the same as compare_per_layer_out.txt, and the csv file format is convenient for you to open and analyze by excel and other software.

sensitivity

def sensitivity(
    self,
    device: Optional[torch.device] = None,
    metric: Union[str, _Metric] = Metric.ATOL,
    reserve: bool = False
):

Sensitivity ordering of individual nodes in the model. Applies to the float conversion to calibration accuracy dropout problem.

Parameters

device: deprecated. Specify the GPU through device_ids during initialization of QuantAnalysis.
metric: metric for similarity ordering, default is ATOL, support COSINE/L1/ATOL.
reserve: whether to print sensitivity nodes in reverse order to support returning some int16 operators to int8 to improve on-board performance.

Output

sensitive_ops.txt. The file is organized in order of quantization sensitivity from highest to lowest op. Each column from left to right represents:
- op_name: op name.
- sensitive_type: Type of calculating quantization sensitivities, including:
  - activation: quantization sensitivity to quantize only the output of this op.
  - weight: quantization sensitivity of the weight of the op only.
  - input-{n}：quantization sensitivity of the nst input of the op only。
- op_type: op type.
- metric: the metric for calculating sensitivity. Sort the metrics in descending order of sensitivity. Support following metrics. L1 is used by default.
  - L1: value range [0, $+\infty$ ], the higher the value, the more sensitive the op is to quantization (in descending order).
  - Cosine: value range [0,1], the closer to 0, the more sensitive the op is to quantization (in descending order).
  - ATOL: value range [0, $+\infty$ ], the higher the value, the more sensitive the op is to quantization (in descending order).
- quant_dtype: the quant dtype of this op output. Usually qint8/qint16.
- flops: the flops of this op and its proportion in the whole model.
sensitive_ops.pt: a sensitivity-ordered list saved using torch.save for your subsequent loading use. The format of the list is described in the Return Values section.

Return Value

Sensitivity List, each element in the List is a sub-list recording an op's sensitivity information. each item in the sub-list from left to right is [op_name, sensitive_type, op_type, metric, quant_dtype, flops] .

An example of the whole List is as follows.

[
    [op1, "activation", op1_type, L1, qint8, flops1],
    [op2, "activation", op2_type, L1, qint8, flops2],
    [op3, "activation", op3_type, L1, qint8, flops3],
    [op1, "weight", op1_type, L1, qint8, flops4],
    ...
]

You can configure the ops with the top n quantization sensitivities with high accuracy (e.g., int16) to try to improve the quantized model accuracy.

op_name    sensitive_type    op_type                                                           L1  quant_dtype    flops
---------  ----------------  -------------------------------------------------------  -----------  -------------  -------------
conv       weight            <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>    0.000553844  qint8          9216(100.00%)
conv       activation        <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>    0.000472854  qint8          9216(100.00%)
quant      activation        <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>  0.000249175  qint8          0(0%)

clean

def clean(self)

Clears intermediate results. Only files such as comparison results are retained.

ModelProfiler Class

Statistic about the inputs and outputs of each layer of operators in the forward process of the model.

# from horizon_plugin_profiler import ModelProfiler

class ModelProfiler(object):
    def __init__(
        self,
        model: torch.nn.Module,
        out_dir: str,
    )

Parameters

model: model that requires statistics.
out_dir: path where the associated file is saved.

Note

This class only supports use by means of a with statement.

with ModelProfiler(net, "./profiler_dir") as p:
    net(data)

p.get_info_manager().table()
p.get_info_manager().tensorboard()

The methods in this class are as follows.

get_info_manager

def get_info_manager(self)

Get the structure that manages the information for each op.

Return Value

The structure OpRunningInfoManager manages the information stored for each op. Two of the important interfaces are as follows.

table

class OpRunningInfoManager:
    def table(
        self,
        out_dir: str = None,
        prefixes: Tuple[str, ...] = None,
        types: Tuple[Type, ...] = None,
        with_stack: bool = False,
    )

Show individual model statistics in a table. Store to the statistic.txt file.

Parameters

out_dir: storage path of statistic.txt file, default None, store to self.out_dir.
prefixes: prefixes of ops in the model to be counted. Default is all ops.
types: types of the ops in the model to be counted, defaults to all ops.
with_stack: if or not show the position of each op in the code.

Output

statistic.txt file, each column from left to right reads.

Index: op index.
Op Name: op type, module class name or function name.
Mod Name: if it is module class, the prefix name of the module in the model; if it is function type, the prefix name of the module where the function is located.
Attr: input/output/weight/bias.
Dtype: data type of the tensor.
Scale: scale of the tensor.
Min: minimum value of the current tensor.
Max: maximum value of the current tensor.
Mean: average value of the current tensor.
Var: variance of the values in the current tensor.
Shape: tensor shape.

+---------+--------------------------------------------------------------------+------------+--------+---------------+-----------+------------+-----------+------------+-----------+----------------------------+
| Index   | Op Name                                                            | Mod Name   | Attr   | Dtype         | Scale     | Min        | Max       | Mean       | Var       | Shape                      |
|---------+--------------------------------------------------------------------+------------+--------+---------------+-----------+------------+-----------+------------+-----------+----------------------------|
| 0       | horizon_plugin_pytorch.nn.qat.stubs.QuantStub                      | quant      | input  | torch.float32 |           | 0.0003164  | 0.9990171 | 0.5015678  | 0.0846284 | torch.Size([1, 3, 32, 32]) |
| 0       | horizon_plugin_pytorch.nn.qat.stubs.QuantStub                      | quant      | output | qint8         | 0.0078354 | 0.0000000  | 0.9950994 | 0.5014852  | 0.0846521 | torch.Size([1, 3, 32, 32]) |
| 1       | horizon_plugin_pytorch.nn.qat.conv2d.Conv2d                        | conv       | input  | qint8         | 0.0078354 | 0.0000000  | 0.9950994 | 0.5014852  | 0.0846521 | torch.Size([1, 3, 32, 32]) |
| 1       | horizon_plugin_pytorch.nn.qat.conv2d.Conv2d                        | conv       | weight | torch.float32 |           | -0.5315086 | 0.5750652 | 0.0269936  | 0.1615299 | torch.Size([3, 3, 1, 1])   |
| 1       | horizon_plugin_pytorch.nn.qat.conv2d.Conv2d                        | conv       | bias   | torch.float32 |           | -0.4963555 | 0.4448483 | -0.0851902 | 0.2320642 | torch.Size([3])            |
| 1       | horizon_plugin_pytorch.nn.qat.conv2d.Conv2d                        | conv       | output | qint8         | 0.0060428 | -0.7674332 | 0.4652941 | -0.0412943 | 0.0422743 | torch.Size([1, 3, 32, 32]) |
| 2       | horizon_plugin_pytorch.nn.qat.relu.ReLU                            | relu       | input  | qint8         | 0.0060428 | -0.7674332 | 0.4652941 | -0.0412943 | 0.0422743 | torch.Size([1, 3, 32, 32]) |
| 2       | horizon_plugin_pytorch.nn.qat.relu.ReLU                            | relu       | output | qint8         | 0.0060428 | 0.0000000  | 0.4652941 | 0.0639115  | 0.0089839 | torch.Size([1, 3, 32, 32]) |
| 3       | horizon_plugin_pytorch.nn.interpolate.autocasted_interpolate_outer |            | input  | qint8         | 0.0060428 | 0.0000000  | 0.4652941 | 0.0639115  | 0.0089839 | torch.Size([1, 3, 32, 32]) |
| 3       | horizon_plugin_pytorch.nn.interpolate.autocasted_interpolate_outer |            | output | qint8         | 0.0060428 | 0.0000000  | 0.3504813 | 0.0639483  | 0.0043366 | torch.Size([1, 3, 41, 41]) |
| 4       | horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub                    | dequant    | input  | qint8         | 0.0060428 | 0.0000000  | 0.3504813 | 0.0639483  | 0.0043366 | torch.Size([1, 3, 41, 41]) |
| 4       | horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub                    | dequant    | output | torch.float32 |           | 0.0000000  | 0.3504813 | 0.0639483  | 0.0043366 | torch.Size([1, 3, 41, 41]) |
+---------+--------------------------------------------------------------------+------------+--------+---------------+-----------+------------+-----------+------------+-----------+----------------------------+

tensorboard

class OpRunningInfoManager:
    def tensorboard(
        self,
        out_dir: str = None,
        prefixes: Tuple[str, ...] = None,
        types: Tuple[Type, ...] = None,
        force_per_channel: bool = False,
    ):

Show the input and output histograms for each layer in the tensorboard.

Parameters

out_dir: directory where tensorboard related files are kept. Default is self.out_dir/tensorboard.
prefixes: prefixes of the ops in the model to be counted, default is all.
types: types of the ops in the model to be counted, default is all.
force_per_channel: if or not to display the histogram in per_channel quantization.

Output

The tensorboard file, opened with the following screenshot.

tensorboard

HbirModelProfiler Class

The functionality and usage of this class is identical to the ModelProfiler class. Please refer to ModelProfiler Class for usage.

Attention

Due to the special format of the hbir model, the qat hbir model needs to add index 0 at forward process.

with HbirModelProfiler(qat_hbir, "./hbir_dir") as p:
    qat_hbir[0](data)

p.get_info_manager().table()
p.get_info_manager().tensorboard()

On This Page

The hb_compile Tool

QAT

qconfig_setter

Model Export

Horizon Operator

Model Inference API Introduction

Data Type and Data Structure

Function Interface

Model Inference Tool Introduction

The hrt_model_exec Tool Introduction

The hbm_infer Tool Introduction

Data Structure

Functional Interface

Operator Support List

Operator BPU Constraint List

#Accuracy Tuning Tool Guide

#Quickstart

#Analyzing Statistical Information

#Multi-machine Multi-GPU

#API Reference

#Model Structure Checking

#QuantAnalysis Class

#auto_find_bad_case

#set_bad_case

#load_bad_case

#save_bad_case

#set_model_profiler_dir

#run

#compare_per_layer

#sensitivity

#clean

#ModelProfiler Class

#get_info_manager

#table

#tensorboard

#HbirModelProfiler Class

Accuracy Tuning Tool Guide

Quickstart

Analyzing Statistical Information

Multi-machine Multi-GPU

API Reference

Model Structure Checking

QuantAnalysis Class

auto_find_bad_case

set_bad_case

load_bad_case

save_bad_case

set_model_profiler_dir

run

compare_per_layer

sensitivity

clean

ModelProfiler Class

get_info_manager

table

tensorboard

HbirModelProfiler Class