Pyramid输入的多Batch ResNet18模型部署实践指导

地平线OpenExplorer工具链的PTQ链路整体使用流程包括模型优化、模型校准、模型转换为定点模型、模型编译及上板等多个阶段。本章节以基于公版ResNet18的Pyramid输入的多Batch分类模型为例（计算平台S100），分步骤为您进行部署实践的使用演示供您进行参考。

准备浮点模型

准备ResNet18浮点模型，这里我们使用torchvision导出所需的浮点模型。

prepare_model.py

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
input_shape = (1, 3, 224, 224)
input_data = torch.randn(input_shape)
output_path = "resnet18.onnx"
torch.onnx.export(model, input_data, output_path,
                  input_names=["input"], output_names=["output"],
                  opset_version=10)

校准集准备

公版ResNet18模型的相关信息可参考 Pytorch文档内对ResNet18的说明，可以看到ResNet18模型的数据前处理流程为：

图像短边放缩至256。
以中心裁剪方式将图像尺寸调整至224x224。
数据归一化处理，mean取值[0.485, 0.456, 0.406]，std取值[0.229, 0.224, 0.225]。

数据前处理代码示例如下：

data_preprocess.py

import os
import cv2
import PIL
import numpy as np
from PIL import Image

ori_dataset_dir = "./calibration_data/imagenet"
calibration_dir = "./calibration_data_rgb"


def resize_transformer(image_data: np.array, short_size: int):
    image = Image.fromarray(image_data.astype('uint8'), 'RGB')
    # Specify width, height
    w, h = image.size  
    if (w <= h and w == short_size) or (h <= w and h == short_size):
        return np.array(image)
    # I.e., the width of the image is the short side
    if w < h:  
        resize_size = (short_size, int(short_size * h / w))
    # I.e., the height of the image is the short side
    else:  
        resize_size = (int(short_size * w / h), short_size)
    # Resize the image
    data = np.array(image.resize(resize_size, Image.BILINEAR))
    return data


def center_crop_transformer(image_data: np.array, crop_size: int):
    image = Image.fromarray(image_data.astype('uint8'), 'RGB')
    image_width, image_height = image.size
    crop_height, crop_width = (crop_size, crop_size)
    crop_top = int(round((image_height - crop_height) / 2.))
    crop_left = int(round((image_width - crop_width) / 2.))
    image_data = image.crop((crop_left,
                             crop_top,
                             crop_left + crop_width,
                             crop_top + crop_height))
    return np.array(image_data).astype(np.float32)



os.mkdir(calibration_dir)
for image_name in os.listdir(ori_dataset_dir):
    image_path = os.path.join(ori_dataset_dir, image_name)
    # load the image with PIL method
    pil_image_data = PIL.Image.open(image_path).convert('RGB')
    image_data = np.array(pil_image_data).astype(np.uint8)
    # Resize the image
    image_data = resize_transformer(image_data, 256)
    # Crop the image
    image_data = center_crop_transformer(image_data, 224)

    # Adjust the data range from [0, 255] to [0, 1]
    image_data = image_data * (1 / 255)

    # Normalization, (data - mean) / std
    mean = [0.485, 0.456, 0.406]
    image_data = image_data - mean
    std = [0.229, 0.224, 0.225]
    image_data = image_data / std
    # Convert format from HWC to CHW
    image_data = np.transpose(image_data, (2, 0, 1)).astype(np.float32)
    # Convert format from CHW to NCHW
    image_data = image_data[np.newaxis, :]

    # Save the npy file
    cali_file_path = os.path.join(calibration_dir, image_name[:-5] + ".npy")
    np.save(cali_file_path, image_data)

为支持PTQ模型校准，我们需要从ImageNet数据集中取出一个小批量数据集，这里用前100张图像为例：

./imagenet
├── ILSVRC2012_val_00000001.JPEG
├── ILSVRC2012_val_00000002.JPEG
├── ILSVRC2012_val_00000003.JPEG
├── ......
├── ILSVRC2012_val_00000099.JPEG
└── ILSVRC2012_val_00000100.JPEG

则基于上文的数据前处理代码生成的校准集目录结构如下：

./calibration_data_bgr
├── ILSVRC2012_val_00000001.npy
├── ILSVRC2012_val_00000002.npy
├── ILSVRC2012_val_00000003.npy
├── ......
├── ILSVRC2012_val_00000099.npy
└── ILSVRC2012_val_00000100.npy

生成板端模型

PTQ转换链路支持命令行工具及PTQ API两种方式进行模型量化编译以生成板端模型，下方为您分别介绍两种方式的使用。

命令行工具方式

命令行工具的方式只需要您安装horizon_tc_ui（Docker环境内已预装）并根据模型信息配置创建对应的yaml文件即可，此处我们以Pyramid输入的多Batch ResNet18模型对应的yaml文件（config.yaml）进行展示并说明。

config.yaml

model_parameters:
  onnx_model: 'resnet18.onnx'
  march: "nash-e"
  working_dir: 'model_output'
  output_model_file_prefix: 'resnet18_224x224_nv12'

input_parameters:
  input_name: ''
  input_shape: ''
  input_type_rt: 'nv12'
  input_type_train: 'rgb'
  input_layout_train: 'NCHW'
  # Formula with [0.485 * 255, 0.456 * 255, 0.406 * 255]
  mean_value: "123.675 116.28 103.53"
  # Formula with [1 / (0.229*255), 1 / (0.224*255), 1 / (0.225*255)]
  scale_value: "0.01712475 0.017507 0.01742919"
  input_batch: 8
  separate_batch: True

calibration_parameters:
  cal_data_dir: './calibration_data_rgb'

compiler_parameters:
  optimize_level: 'O2'

注解

这里将 input_name 及 input_shape 直接置空，是因为工具支持单输入且输入无动态shape的场景下自动补充这两个参数（即工具内部对ONNX模型进行解析，并获取输入的name及shape信息）。

当yaml文件配置完成后，您只需要调用 hb_compile工具执行命令即可，工具执行命令及关键log如下：

[horizon@xxx xxx]$ hb_compile -c config.yaml
INFO Start hb_compile...
INFO Start verifying yaml
INFO End verifying yaml
INFO Start to Horizon NN Model Convert.
INFO Start to prepare the onnx model.
INFO End to prepare the onnx model.
INFO Start to optimize the onnx model.
INFO End to optimize the onnx model.
INFO Start to calibrate the model.
INFO End to calibrate the model.
INFO Start to precompile the model.
INFO End to precompile the model.
INFO End to Horizon NN Model Convert.
INFO Successful covert model: /xxx/resnet18_224x224_nv12_quantized_model.bc
[==================================================]100%
INFO ############# Model input/output info #############
INFO NAME     TYPE   SHAPE            DATA_TYPE
INFO -------- ------ ---------------- ---------
INFO input_y  input  [1, 224, 224, 1] UINT8
INFO input_uv input  [1, 112, 112, 2] UINT8
INFO output   output [1, 1000]        FLOAT32
INFO The hb_compile completes running

命令执行完成后，在yaml文件working_dir参数配置的目录（model_output）下，将生成如下所示各阶段中间模型、最终的上板模型及模型信息文件，其中resnet18_224x224_nv12.hbm即为板端可推理的模型文件：

./model_output
├── ...
├── resnet18_224x224_nv12_calibrated_model.onnx
├── resnet18_224x224_nv12.hbm
├── resnet18_224x224_nv12_optimized_float_model.onnx
├── resnet18_224x224_nv12_original_float_model.onnx
├── resnet18_224x224_nv12_ptq_model.onnx
└── resnet18_224x224_nv12_quantized_model.bc

PTQ API方式

命令行工具在提供高易用性的同时也带来了一些灵活度的降低，因此，当您有灵活性需求时，可以使用PTQ API方式来完成模型的量化编译，下方为您介绍使用API的方式生成板端模型的具体流程。

注意

请注意，由于部分接口存在较多参数，下方示例展示中我们仅对必要参数进行了配置以便于您进行整体的实践验证，具体接口的全量参数请参考 HMCT API Refernence 和 HBDK Tool API Reference。

模型优化校准

首先，对浮点模型进行图优化及校准量化，这个过程我们使用 HMCT 的API，具体示例如下：

calibration.py

import os
import logging
import numpy as np
from hmct.api import build_model

logging.basicConfig(level=logging.INFO)

march = "nash"
onnx_path = "./resnet18.onnx"
cali_data_dir = "./calibration_data_rgb"
model_name = "resnet18_224x224_nv12"
working_dir = "./model_output/"

cali_data = []
for cali_data_name in os.listdir(cali_data_dir):
    data_path = os.path.join(cali_data_dir, cali_data_name)
    cali_data.append(np.load(data_path))


ptq_params = {
    'cali_dict': {
        'calibration_data': {
            'input': cali_data
        }
    },
    'input_dict': {
        'input': {
            'input_batch': 8
        }
    },
    'debug_methods': [],
    'output_nodes': []
}

if not os.path.exists(working_dir):
    os.mkdir(working_dir)
build_model(onnx_file=onnx_path,
            march=march,
            name_prefix=working_dir + model_name,
            **ptq_params)

正确执行完build_model后，在working_dir目录下将生成各阶段ONNX模型，目录结构如下：

./model_output
├── resnet18_224x224_nv12_calibrated_model.onnx
├── resnet18_224x224_nv12_optimized_float_model.onnx
├── resnet18_224x224_nv12_original_float_model.onnx
├── resnet18_224x224_nv12_ptq_model.onnx
└── resnet18_224x224_nv12_quant_info.json

这里的 *ptq_model.onnx文件即经过图优化、校准过程的ONNX模型文件，中间阶段ONNX模型的具体说明请参考训练后量化(PTQ)-PTQ转换步骤-模型量化与编译-转换产出物解读章节。

模型转定点及编译

接下来需要完成PTQ模型转为定点模型及模型编译操作，这个过程我们需要通过编译器的API来完成，示例如下：

compile.py

import os
import onnx
from hbdk4.compiler.onnx import export
from hbdk4.compiler import convert, compile

input_batch = 8
march = "nash-e"
working_dir = "./model_output/"
model_name = "resnet18_224x224_nv12"
ptq_onnx_path = "./model_output/resnet18_224x224_nv12_ptq_model.onnx"

if not os.path.exists(working_dir):
    os.mkdir(working_dir)
# load onnx model
ptq_onnx = onnx.load(ptq_onnx_path)
# Convert onnx model to hbir model
ptq_model = export(proto=ptq_onnx, name=model_name)
func = ptq_model.functions[0]

mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

func.inputs[0].insert_split(dim=0)
for i in range(input_batch - 1, -1, -1):
    # Convert format from NCHW to NHWC
    func.inputs[i].insert_transpose([0, 3, 1, 2])
    # Insert node for color conversion and normalization
    func.inputs[i].insert_image_preprocess(mode="yuvbt601full2rgb",
                                           divisor=255,
                                           mean=mean,
                                           std=std,
                                           is_signed=True)
    # Insert node for conversion from nv12 to yuv444
    func.inputs[i].insert_image_convert(mode="nv12")
# Convert type from float to int 
quantized_model = convert(m=ptq_model, march=march)
compile(m=quantized_model,
        path=working_dir + model_name + ".hbm",
        march=march,
        opt=0,
        progress_bar=True)

编译完成后，working_dir目录下将保存中间阶段和最终可用于上板的模型文件，目录结构如下：

./model_output
├── resnet18_224x224_nv12_calibrated_model.onnx
├── resnet18_224x224_nv12.hbm
├── resnet18_224x224_nv12_optimized_float_model.onnx
├── resnet18_224x224_nv12_original_float_model.onnx
├── resnet18_224x224_nv12_ptq_model.onnx
└── resnet18_224x224_nv12_quant_info.json

可视化

生成所需hbm模型后，我们支持您通过 hb_model_info 和 hrt_model_exec 工具可视化查看，参考命令如下：

hb_model_info工具

hb_model_info -v resnet18_224x224_nv12.hbm

hrt_model_exec工具

hrt_model_exec model_info --model_file resnet18_224x224_nv12.hbm

构建板端示例

准备板端示例所需依赖库

如需尽快完成板端示例的构建，我们建议您直接使用OE包中 samples/ucp_tutorial/deps_aarch64。

目录下内容作为依赖库，板端运行示例依赖的关键头文件及动态库路径如下：

./deps_aarch64
├── ......
└── ucp
    ├── include
    │   └── hobot
    │       ├── dnn
    │       │   ├── hb_dnn.h
    │       │   ├── hb_dnn_status.h
    │       │   └── hb_dnn_v1.h
    │       ├── ......
    │       ├── hb_sys.h
    │       ├── hb_ucp.h
    │       ├── hb_ucp_status.h
    │       └── hb_ucp_sys.h
    └── lib
        ├── ......
        ├── libdnn.so
        └── libhbucp.so

板端示例开发

下方示例展示了基于二进制文件输入和板端模型，完成一次板端模型推理并获取分类结果TOP1的过程。

main.cc

#include <fstream>
#include <iostream>
#include <vector>
#include <cstring>

#include "hobot/dnn/hb_dnn.h"
#include "hobot/hb_ucp.h"
#include "hobot/hb_ucp_sys.h"

#define ALIGN(value, alignment) (((value) + ((alignment)-1)) & ~((alignment)-1))
#define ALIGN_32(value) ALIGN(value, 32)

const char* hbm_path = "resnet18_224x224_nv12.hbm";
std::string data_y_path = "ILSVRC2012_val_00000001_y.bin";
std::string data_uv_path = "ILSVRC2012_val_00000001_uv.bin";
int input_batch = 8;

// Read binary input file
int read_binary_file(std::string file_path, char **bin, int *length) {
    std::ifstream ifs(file_path, std::ios::in | std::ios::binary);
    ifs.seekg(0, std::ios::end);
    *length = ifs.tellg();
    ifs.seekg(0, std::ios::beg);
    *bin = new char[sizeof(char) * (*length)];
    ifs.read(*bin, *length);
    ifs.close();
    return 0;
}

// Prepare input tensor and output tensor
int prepare_tensor(hbDNNTensor *input_tensor, hbDNNTensor *output_tensor,
                   hbDNNHandle_t dnn_handle);

int main() {
  // Get model handle
  hbDNNPackedHandle_t packed_dnn_handle;
  hbDNNHandle_t dnn_handle;
  hbDNNInitializeFromFiles(&packed_dnn_handle, &hbm_path, 1);
  const char **model_name_list;
  int model_count = 0;
  hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle);
  hbDNNGetModelHandle(&dnn_handle, packed_dnn_handle, model_name_list[0]);

  // Prepare input and output tensor
  std::vector<hbDNNTensor> input_tensors;
  std::vector<hbDNNTensor> output_tensors;
  int input_count = 0;
  int output_count = 0;
  hbDNNGetInputCount(&input_count, dnn_handle);
  hbDNNGetOutputCount(&output_count, dnn_handle);
  input_tensors.resize(input_count);
  output_tensors.resize(output_count);

  // Initialize and malloc the tensor
  prepare_tensor(input_tensors.data(), output_tensors.data(), dnn_handle);

  // Copy binary input data to input tensor
  int32_t data_length = 0;
  char *y_data = nullptr;
  read_binary_file(data_y_path, &y_data, &data_length);
  char *uv_data = nullptr;
  read_binary_file(data_uv_path, &uv_data, &data_length);
  for (auto i = 0; i < input_batch; i++) {
    memcpy(reinterpret_cast<char *>(input_tensors[i*2].sysMem.virAddr),
           y_data, input_tensors[i*2].sysMem.memSize);
    hbUCPMemFlush(&(input_tensors[i*2].sysMem), HB_SYS_MEM_CACHE_CLEAN);
    memcpy(reinterpret_cast<char *>(input_tensors[i*2+1].sysMem.virAddr),
           uv_data, input_tensors[i*2+1].sysMem.memSize);
    hbUCPMemFlush(&(input_tensors[i*2+1].sysMem), HB_SYS_MEM_CACHE_CLEAN);
  }
  free(y_data);
  free(uv_data);

  // Submit task and wait till it completed
  hbUCPTaskHandle_t task_handle{nullptr};
  hbDNNTensor *output = output_tensors.data();
  // Generate task handle
  hbDNNInferV2(&task_handle, output, input_tensors.data(), dnn_handle);
  // Submit task
  hbUCPSchedParam ctrl_param;
  HB_UCP_INITIALIZE_SCHED_PARAM(&ctrl_param);
  ctrl_param.backend = HB_UCP_BPU_CORE_ANY;
  hbUCPSubmitTask(task_handle, &ctrl_param);
  // Wait task completed
  hbUCPWaitTaskDone(task_handle, 0);

  // Parse inference result and calculate TOP1
  hbUCPMemFlush(&output_tensors[0].sysMem, HB_SYS_MEM_CACHE_INVALIDATE);
  auto result = reinterpret_cast<float *>(output_tensors[0].sysMem.virAddr);
  for (auto batch = 0; batch < input_batch; batch++) {
      float max_score = 0.0;
      int label = -1;
      // Find the max score and corresponding label
      for (auto i = 0; i < 1000; i++) {
        float score = result[batch * 1000 + i];
        if (score > max_score) {
          label = i;
          max_score = score;
        }
      }
      // Output the result
      std::cout << "batch[" << batch << "] " 
                << "label: " << label << std::endl;
  }
  hbUCPReleaseTask(task_handle);
  // Free input memory
  for (int i = 0; i < input_count; i++) {
    hbUCPFree(&(input_tensors[i].sysMem));
  }
  // Free output memory
  for (int i = 0; i < output_count; i++) {
    hbUCPFree(&(output_tensors[i].sysMem));
  }
  // Release model
  hbDNNRelease(packed_dnn_handle);
}

// Prepare input tensor and output tensor
int prepare_tensor(hbDNNTensor *input_tensor, hbDNNTensor *output_tensor,
                   hbDNNHandle_t dnn_handle) {
  // Get input and output tensor counts
  int input_count = 0;
  int output_count = 0;
  hbDNNGetInputCount(&input_count, dnn_handle);
  hbDNNGetOutputCount(&output_count, dnn_handle);

  hbDNNTensor *input = input_tensor;
  // Get the properties of the input tensor
  for (int i = 0; i < input_count; i++) {
    hbDNNGetInputTensorProperties(&input[i].properties, dnn_handle, i);
    // Calculate the stride of the input tensor
    auto dim_len = input[i].properties.validShape.numDimensions;
    for (int32_t dim_i = dim_len - 1; dim_i >= 0; --dim_i) {
      if (input[i].properties.stride[dim_i] == -1) {
        auto cur_stride =
            input[i].properties.stride[dim_i + 1] *
            input[i].properties.validShape.dimensionSize[dim_i + 1];
        input[i].properties.stride[dim_i] = ALIGN_32(cur_stride);
      }
    }
    // Calculate the memory size of the input tensor and allocate cache memory
    int input_memSize = input[i].properties.stride[0] *
                        input[i].properties.validShape.dimensionSize[0];
    hbUCPMallocCached(&input[i].sysMem, input_memSize, 0);
  }

  hbDNNTensor *output = output_tensor;
  // Get the properties of the input tensor
  for (int i = 0; i < output_count; i++) {
    hbDNNGetOutputTensorProperties(&output[i].properties, dnn_handle, i);
    // Calculate the memory size of the output tensor and allocate cache memory
    int output_memSize = output[i].properties.alignedByteSize;
    hbUCPMallocCached(&output[i].sysMem, output_memSize, 0);

    // Show how to get output name
    const char *output_name;
    hbDNNGetOutputName(&output_name, dnn_handle, i);
  }
  return 0;
}

交叉编译生成板端可执行程序

进行交叉编译前，您需先准备好CMakeLists.txt和示例文件。CMakeLists.txt内容如下，因示例不包含数据前处理等操作，所以依赖较少，此处主要是对GCC的编译参数、依赖的头文件及动态库的配置。其中dnn板端推理库，而hbucp用于对tensor做操作。

CMakeLists.txt

# CMakeLists.txt
cmake_minimum_required(VERSION 3.0)

project(sample)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wl,-unresolved-symbols=ignore-in-shared-libs")

message(STATUS "Build type: ${CMAKE_BUILD_TYPE}")

set(CMAKE_CXX_FLAGS_DEBUG "-g -O0")
set(CMAKE_C_FLAGS_DEBUG "-g -O0")
set(CMAKE_CXX_FLAGS_RELEASE " -O3 ")
set(CMAKE_C_FLAGS_RELEASE " -O3 ")

set(CMAKE_BUILD_TYPE ${build_type})

set(DEPS_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/deps_aarch64)

include_directories(${DEPS_ROOT}/ucp/include)
link_directories(${DEPS_ROOT}/ucp/lib)

add_executable(run_sample src/main.cc)
target_link_libraries(run_sample dnn hbucp)

编译的环境目录结构如下：

.
├── CMakeLists.txt
├── deps_aarch64
│   └── ucp
│       ├── include
│       └── lib
└── src
    └── main.cc

当示例文件及CMakeLists.txt准备好之后即可执行编译。编译命令的示例如下：

注意

请注意，编译脚本中要将CC和CXX配置为交叉编译GCC和G++的实际路径。

#!/usr/bin/env bash

# Note，please configure according to the actual path
export CC=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
export CXX=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++
    
rm -rf arm_build; mkdir arm_build; cd arm_build
cmake ..; make -j8
cd ..

编译完成后，即可生成可上板运行的 run_sample 二进制程序。至此，板端示例构建流程已全部完成。

板端运行准备

当可执行程序编译完成后，需要对模型的输入进行准备。为降低实践的操作和依赖配置成本，这里通过python对数据做处理，当然，您也可以根据数据处理逻辑在板端示例中通过C++实现(需确保数据处理逻辑相同)，示例如下：

input_data.py

import os
import cv2
import PIL
import numpy as np
from PIL import Image

image_path = "./ILSVRC2012_val_00000001.JPEG"


def resize_transformer(image_data: np.array, short_size: int):
    image = Image.fromarray(image_data.astype('uint8'), 'RGB')
    # Specify width, height
    w, h = image.size
    if (w <= h and w == short_size) or (h <= w and h == short_size):
        return np.array(image)
    # I.e., the width of the image is the short side
    if w < h:
        resize_size = (short_size, int(short_size * h / w))
    # I.e., the height of the image is the short side
    else:
        resize_size = (int(short_size * w / h), short_size)
    # Resize the image
    data = np.array(image.resize(resize_size, Image.BILINEAR))
    return data


def center_crop_transformer(image_data: np.array, crop_size: int):
    image = Image.fromarray(image_data.astype('uint8'), 'RGB')
    image_width, image_height = image.size
    crop_height, crop_width = (crop_size, crop_size)
    crop_top = int(round((image_height - crop_height) / 2.))
    crop_left = int(round((image_width - crop_width) / 2.))
    image_data = image.crop((crop_left,
                             crop_top,
                             crop_left + crop_width,
                             crop_top + crop_height))
    return np.array(image_data).astype(np.float32)

def rgb_to_nv12(image_data: np.array):
    r = image_data[:, :, 0]
    g = image_data[:, :, 1]
    b = image_data[:, :, 2]
    y = (0.299 * r + 0.587 * g + 0.114 * b)
    u = (-0.169 * r - 0.331 * g + 0.5 * b + 128)[::2, ::2]
    v = (0.5 * r - 0.419 * g - 0.081 * b + 128)[::2, ::2]
    uv = np.zeros(shape=(u.shape[0], u.shape[1] * 2))
    for i in range(0, u.shape[0]):
        for j in range(0, u.shape[1]):
            uv[i, 2 * j] = u[i, j]
            uv[i, 2 * j + 1] = v[i, j]
    y = y.astype(np.uint8)
    uv = uv.astype(np.uint8)
    return y, uv

if __name__ == '__main__':
    # load the image with PIL method
    pil_image_data = PIL.Image.open(image_path).convert('RGB')
    image_data = np.array(pil_image_data).astype(np.uint8)
    # Resize the image
    image_data = resize_transformer(image_data, 256)
    # Crop the image
    image_data = center_crop_transformer(image_data, 224)
    # Covert format from RGB to nv12
    y, uv = rgb_to_nv12(image_data)
    y.tofile("ILSVRC2012_val_00000001_y.bin")
    uv.tofile("ILSVRC2012_val_00000001_uv.bin")

完成模型输入数据准备，正确生成用于板端示例执行推理的binary格式的输入文件后，还需要确保您现在已准备好如下内容：

S100开发板，用于实际执行板端程序运行。
一个可用于板端推理的模型（*.hbm），即生成板端模型的产出物。
板端程序（main.cc文件及交叉编译生成板端可执行程序），即构建板端示例的产出物。
板端程序依赖库，为了降低部署成本，您可以直接使用OE包 samples/ucp_tutorial/deps_aarch64/ucp/lib 文件夹中的内容。

准备好之后，我们将模型文件(*.hbm)、输入数据(*.bin文件)、板端程序及依赖库整合到一起，参考目录结构如下：

horizon
├── ILSVRC2012_val_00000001_uv.bin
├── ILSVRC2012_val_00000001_y.bin
├── lib
├── resnet18_224x224_nv12.hbm
└── run_sample

将此整合的文件夹整体拷贝至板端环境下，参考如下指令：

scp -r horizon/ root@{board_ip}:/map/

板端运行

最后，对LD_LIBRARY_PATH进行配置并运行程序即可，如下所示：

horizon@hobot:/map/horizon# export LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH
horizon@hobot:/map/horizon# ./run_sample
......
label: 65

可以看到，LOG中打印的 label: 65 正是对应ImageNet数据集中ILSVRC2012_val_00000001图片的label，即分类结果正确。

至此，以Pyramid输入的多Batch ResNet18模型的全流程部署实践就结束了。

页面目录

hb_compile工具

QAT

模型导出

Horizon算子

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

数据结构

功能接口

算子支持列表

算子BPU约束列表

#Pyramid输入的多Batch ResNet18模型部署实践指导

#准备浮点模型

#校准集准备

#生成板端模型

#命令行工具方式

#PTQ API方式

#模型优化校准

#模型转定点及编译

#可视化

#构建板端示例

#板端运行准备

#板端运行