地平线OpenExplorer工具链的PTQ链路整体使用流程包括模型优化、模型校准、模型转换为定点模型、模型编译及上板等多个阶段。 本章节以基于公版ResNet18的Resizer输入分类模型为例(计算平台S100),分步骤为您进行部署实践的使用演示供您进行参考。
准备ResNet18浮点模型,这里我们使用torchvision导出所需的浮点模型。
import torch
import torchvision
model = torchvision.models.resnet18(pretrained=True)
input_shape = (1, 3, 224, 224)
input_data = torch.randn(input_shape)
output_path = "resnet18.onnx"
torch.onnx.export(model, input_data, output_path,
input_names=["input"], output_names=["output"],
opset_version=10)
公版ResNet18模型的相关信息可参考 Pytorch文档内对ResNet18的说明,可以看到ResNet18模型的数据前处理流程为:
数据前处理代码示例如下:
import os
import cv2
import PIL
import numpy as np
from PIL import Image
ori_dataset_dir = "./calibration_data/imagenet"
calibration_dir = "./calibration_data_rgb"
def resize_transformer(image_data: np.array, short_size: int):
image = Image.fromarray(image_data.astype('uint8'), 'RGB')
# Specify width, height
w, h = image.size
if (w <= h and w == short_size) or (h <= w and h == short_size):
return np.array(image)
# I.e., the width of the image is the short side
if w < h:
resize_size = (short_size, int(short_size * h / w))
# I.e., the height of the image is the short side
else:
resize_size = (int(short_size * w / h), short_size)
# Resize the image
data = np.array(image.resize(resize_size, Image.BILINEAR))
return data
def center_crop_transformer(image_data: np.array, crop_size: int):
image = Image.fromarray(image_data.astype('uint8'), 'RGB')
image_width, image_height = image.size
crop_height, crop_width = (crop_size, crop_size)
crop_top = int(round((image_height - crop_height) / 2.))
crop_left = int(round((image_width - crop_width) / 2.))
image_data = image.crop((crop_left,
crop_top,
crop_left + crop_width,
crop_top + crop_height))
return np.array(image_data).astype(np.float32)
os.mkdir(calibration_dir)
for image_name in os.listdir(ori_dataset_dir):
image_path = os.path.join(ori_dataset_dir, image_name)
# load the image with PIL method
pil_image_data = PIL.Image.open(image_path).convert('RGB')
image_data = np.array(pil_image_data).astype(np.uint8)
# Resize the image
image_data = resize_transformer(image_data, 256)
# Crop the image
image_data = center_crop_transformer(image_data, 224)
# Adjust the data range from [0, 255] to [0, 1]
image_data = image_data * (1 / 255)
# Normalization, (data - mean) / std
mean = [0.485, 0.456, 0.406]
image_data = image_data - mean
std = [0.229, 0.224, 0.225]
image_data = image_data / std
# Convert format from HWC to CHW
image_data = np.transpose(image_data, (2, 0, 1)).astype(np.float32)
# Convert format from CHW to NCHW
image_data = image_data[np.newaxis, :]
# Save the npy file
cali_file_path = os.path.join(calibration_dir, image_name[:-5] + ".npy")
np.save(cali_file_path, image_data)
为支持PTQ模型校准,我们需要从ImageNet数据集中取出一个小批量数据集,这里用前100张图像为例:
./imagenet
├── ILSVRC2012_val_00000001.JPEG
├── ILSVRC2012_val_00000002.JPEG
├── ILSVRC2012_val_00000003.JPEG
├── ......
├── ILSVRC2012_val_00000099.JPEG
└── ILSVRC2012_val_00000100.JPEG
则基于上文的数据前处理代码生成的校准集目录结构如下:
./calibration_data_rgb
├── ILSVRC2012_val_00000001.npy
├── ILSVRC2012_val_00000002.npy
├── ILSVRC2012_val_00000003.npy
├── ......
├── ILSVRC2012_val_00000099.npy
└── ILSVRC2012_val_00000100.npy
PTQ转换链路支持命令行工具及PTQ API两种方式进行模型量化编译以生成板端模型,下方为您分别介绍两种方式的使用。
命令行工具的方式只需要您安装horizon_tc_ui(Docker环境内已预装)并根据模型信息配置创建对应的yaml文件即可, 此处我们以Resizer输入的ResNet18模型对应的yaml文件(config.yaml)进行展示并说明。
model_parameters:
onnx_model: 'resnet18.onnx'
march: "nash-e"
working_dir: 'model_output'
output_model_file_prefix: 'resnet18_224x224_nv12_resizer'
input_parameters:
input_name: ''
input_shape: ''
input_type_rt: 'nv12'
input_type_train: 'rgb'
input_layout_train: 'NCHW'
# Formula with [0.485 * 255, 0.456 * 255, 0.406 * 255]
mean_value: "123.675 116.28 103.53"
# Formula with [1 / (0.229*255), 1 / (0.224*255), 1 / (0.225*255)]
scale_value: "0.01712475 0.017507 0.01742919"
calibration_parameters:
cal_data_dir: './calibration_data_rgb'
compiler_parameters:
optimize_level: 'O2'
input_source:
input: resizer
这里将 input_name
及 input_shape
直接置空,是因为工具支持单输入且输入无动态shape的场景下自动补充这两个参数(即工具内部对ONNX模型进行解析,并获取输入的name及shape信息)。
当yaml文件配置完成后,您只需要调用 hb_compile工具 执行命令即可,工具执行命令及关键log如下:
[horizon@xxx xxx]$ hb_compile -c config.yaml
INFO Start hb_compile...
INFO Start verifying yaml
INFO End verifying yaml
INFO Start to Horizon NN Model Convert.
INFO Start to prepare the onnx model.
INFO End to prepare the onnx model.
INFO Start to optimize the onnx model.
INFO End to optimize the onnx model.
INFO Start to calibrate the model.
INFO End to calibrate the model.
INFO Start to precompile the model.
INFO End to precompile the model.
INFO End to Horizon NN Model Convert.
INFO Successful covert model: /xxx/resnet18_224x224_nv12_resizer_quantized_model.bc
[==================================================]100%
INFO ############# Model input/output info #############
INFO NAME TYPE SHAPE DATA_TYPE
INFO --------- ------ ------------------ ---------
INFO input_y input [1, None, None, 1] UINT8
INFO input_uv input [1, None, None, 2] UINT8
INFO input_roi input [1, 4] INT32
INFO output output [1, 1000] FLOAT32
INFO The hb_compile completes running
命令执行完成后,在yaml文件working_dir参数配置的目录(model_output)下,将生成如下所示各阶段中间模型、最终的上板模型及模型信息文件,其中resnet18_224x224_nv12_resizer.hbm即为板端可推理的模型文件:
./model_output
├── resnet18_224x224_nv12_resizer_advice.json
├── resnet18_224x224_nv12_resizer_calibrated_model.onnx
├── resnet18_224x224_nv12_resizer.hbm
├── resnet18_224x224_nv12_resizer.html
├── resnet18_224x224_nv12_resizer.json
├── resnet18_224x224_nv12_resizer_node_info.csv
├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx
├── resnet18_224x224_nv12_resizer_original_float_model.onnx
├── resnet18_224x224_nv12_resizer_ptq_model.onnx
├── resnet18_224x224_nv12_resizer_quant_info.json
└── resnet18_224x224_nv12_resizer_quantized_model.bc
命令行工具在提供高易用性的同时也带来了一些灵活度的降低,因此,当您有灵活性需求时,可以使用PTQ API方式来完成模型的量化编译,下方为您介绍使用API的方式生成板端模型的具体流程。
请注意,由于部分接口存在较多参数,下方示例展示中我们仅对必要参数进行了配置以便于您进行整体的实践验证,具体接口的全量参数请参考 HMCT API Refernence 和 HBDK Tool API Reference。
首先,对浮点模型进行图优化及校准量化,这个过程我们使用 HMCT 的API,具体示例如下:
import os
import logging
import numpy as np
from hmct.api import build_model
logging.basicConfig(level=logging.INFO)
march = "nash"
onnx_path = "./resnet18.onnx"
cali_data_dir = "./calibration_data_rgb"
model_name = "resnet18_224x224_nv12_resizer"
working_dir = "./model_output/"
cali_data = []
for cali_data_name in os.listdir(cali_data_dir):
data_path = os.path.join(cali_data_dir, cali_data_name)
cali_data.append(np.load(data_path))
ptq_params = {
'cali_dict': {
'calibration_data': {
'input': cali_data
}
},
'debug_methods': [],
'output_nodes': []
}
if not os.path.exists(working_dir):
os.mkdir(working_dir)
build_model(onnx_file=onnx_path,
march=march,
name_prefix=working_dir + model_name,
**ptq_params)
正确执行完 build_model
后,在 working_dir
目录下将生成各阶段ONNX模型,目录结构如下:
./model_output
├── resnet18_224x224_nv12_resizer_calibrated_model.onnx
├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx
├── resnet18_224x224_nv12_resizer_original_float_model.onnx
├── resnet18_224x224_nv12_resizer_ptq_model.onnx
└── resnet18_224x224_nv12_resizer_quant_info.json
这里的 *ptq_model.onnx文件即经过图优化、校准过程的ONNX模型文件,中间阶段ONNX模型的具体说明请参考训练后量化(PTQ)-PTQ转换步骤-模型量化与编译-转换产出物解读 章节。
接下来需要完成PTQ模型转为定点模型及模型编译操作,这个过程我们需要通过编译器的API来完成,示例如下:
import os
import onnx
from hbdk4.compiler.onnx import export
from hbdk4.compiler import convert, compile
march = "nash-e"
working_dir = "./model_output/"
model_name = "resnet18_224x224_nv12_resizer"
ptq_onnx_path = "./model_output/resnet18_224x224_nv12_resizer_ptq_model.onnx"
if not os.path.exists(working_dir):
os.mkdir(working_dir)
# load onnx model
ptq_onnx = onnx.load(ptq_onnx_path)
# Convert onnx model to hbir model
ptq_model = export(proto=ptq_onnx, name=model_name)
func = ptq_model.functions[0]
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
# Convert format from NCHW to NHWC
func.inputs[0].insert_transpose([0, 3, 1, 2])
# Insert node for color conversion and normalization
func.inputs[0].insert_image_preprocess(mode="yuvbt601full2rgb",
divisor=255,
mean=mean,
std=std,
is_signed=True)
# Insert node for conversion from nv12 to yuv444
func.inputs[0].insert_roi_resize(mode="nv12")
# Convert type from float to int
quantized_model = convert(m=ptq_model, march=march)
compile(m=quantized_model,
path=working_dir + model_name + ".hbm",
march=march,
progress_bar=True)
编译完成后,working_dir目录下将保存中间阶段和最终可用于上板的模型文件,目录结构如下:
./model_output
├── resnet18_224x224_nv12_resizer_calibrated_model.onnx
├── resnet18_224x224_nv12_resizer.hbm
├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx
├── resnet18_224x224_nv12_resizer_original_float_model.onnx
├── resnet18_224x224_nv12_resizer_ptq_model.onnx
└── resnet18_224x224_nv12_resizer_quant_info.json
生成所需hbm模型后,我们支持您通过 hb_model_info
和 hrt_model_exec
工具可视化查看,参考命令如下:
hb_model_info -v resnet18_224x224_nv12_resizer.hbm
hrt_model_exec model_info --model_file resnet18_224x224_nv12_resizer.hbm
如需尽快完成板端示例的构建,我们建议您直接使用OE包中 samples/ucp_tutorial/deps_aarch64
。
目录下内容作为依赖库,板端运行示例依赖的关键头文件及动态库路径如下:
./deps_aarch64
├── ......
└── ucp
├── include
│ └── hobot
│ ├── dnn
│ │ ├── hb_dnn.h
│ │ ├── hb_dnn_status.h
│ │ └── hb_dnn_v1.h
│ ├── ......
│ ├── hb_sys.h
│ ├── hb_ucp.h
│ ├── hb_ucp_status.h
│ └── hb_ucp_sys.h
└── lib
├── ......
├── libdnn.so
└── libhbucp.so
下方示例展示了基于二进制文件输入和板端模型,完成一次板端模型推理并获取分类结果TOP1的过程。
resizer模型的输入y和uv的准备方式您可参考下方示例代码中的 read_image_2_nv12
函数。
#include <fstream>
#include <cstring>
#include <iostream>
#include <map>
#include <vector>
#include "hobot/dnn/hb_dnn.h"
#include "hobot/hb_ucp.h"
#include "hobot/hb_ucp_sys.h"
const char *model_file = "resnet18_224x224_nv12_resizer.hbm";
std::string data_y_path = "ILSVRC2012_val_00000001_y.bin";
std::string data_uv_path = "ILSVRC2012_val_00000001_uv.bin";
typedef struct Roi {
int32_t left;
int32_t top;
int32_t right;
int32_t bottom;
} Roi;
int read_image_2_nv12(std::string &y_path, std::string &uv_path,
std::vector<hbUCPSysMem> &image_mem, int &input_h,
int &input_w);
int prepare_roi_mem(const std::vector<Roi> &rois,
std::vector<hbUCPSysMem> &roi_mem);
int prepare_image_tensor(const std::vector<hbUCPSysMem> &image_mem, int input_h,
int input_w, hbDNNHandle_t dnn_handle,
std::vector<hbDNNTensor> &input_tensor);
int read_binary_file(std::string file_path, char **bin, int *length);
/**
* prepare roi tensor
* @param[in] roi_mem: roi mem info
* @param[in] dnn_handle: dnn handle
* @param[in] roi_tensor_id: tensor id of roi input in model
* @param[out] roi_tensor: roi tensor
*/
int prepare_roi_tensor(const hbUCPSysMem *roi_mem, hbDNNHandle_t dnn_handle,
int32_t roi_tensor_id, hbDNNTensor *roi_tensor);
/**
* prepare out tensor
* @param[in] dnn_handle: dnn handle
* @param[out] output: output tensor
*/
int prepare_output_tensor(hbDNNHandle_t dnn_handle,
std::vector<hbDNNTensor> &output);
int main(int argc, char **argv) {
// load model
hbDNNPackedHandle_t packed_dnn_handle;
hbDNNHandle_t dnn_handle;
const char **model_name_list;
int model_count = 0;
// Step1: get model handle
hbDNNInitializeFromFiles(&packed_dnn_handle, &model_file, 1);
hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle);
hbDNNGetModelHandle(&dnn_handle, packed_dnn_handle, model_name_list[0]);
// Step2: set input data to nv12
// In the sample, since the input is a same image, can allocate a memory for
// reusing. image_mems is to save image data for y and uv.
std::vector<hbUCPSysMem> image_mems(2);
// image input size
int input_h = 224;
int input_w = 224;
read_image_2_nv12(data_y_path, data_uv_path, image_mems, input_h, input_w);
// Step3: prepare roi mem
/**
* Suppose to infer 2 roi tasks of data, the number of ROIs to be prepared is
* also 2.
*/
// left = 0, top = 0 right = 223, bottom = 223
Roi roi_1 = {0, 0, 223, 223};
// left = 1, top = 1, right = 223, bottom = 223
Roi roi_2 = {1, 1, 223, 223};
std::vector<Roi> rois;
rois.push_back(roi_1);
rois.push_back(roi_2);
int roi_num = 2;
std::vector<hbUCPSysMem> roi_mems(2);
prepare_roi_mem(rois, roi_mems);
// Step4: prepare input and output tensor
std::vector<std::vector<hbDNNTensor>> input_tensors(roi_num);
std::vector<std::vector<hbDNNTensor>> output_tensors(roi_num);
for (int i = 0; i < roi_num; ++i) {
// prepare input tensor
int input_count = 0;
hbDNNGetInputCount(&input_count, dnn_handle);
input_tensors[i].resize(input_count);
// prepare image tensor
/** Tips:
* In the sample, all tasks use the same image, so allocate memory to
* save image. all input tensor can reuse the memory. if your model has
* different input image, please allocate different memory for all inputs.
* */
prepare_image_tensor(image_mems, input_h, input_w,
dnn_handle, input_tensors[i]);
auto roi_tensor_id = 2;
prepare_roi_tensor(&roi_mems[i], dnn_handle, roi_tensor_id,
&input_tensors[i][roi_tensor_id]);
// prepare output tensor
int output_count = 0;
hbDNNGetOutputCount(&output_count, dnn_handle);
output_tensors[i].resize(output_count);
prepare_output_tensor(dnn_handle, output_tensors[i]);
}
// Step5: run inference
hbUCPTaskHandle_t task_handle{nullptr};
/** Tips:
* In the sample, submit multiple tasks at the same time
* when taskHandle is nullptr, here create a new task,and
* when taskHandle is created but not submitted yet, attach new task to the previous which represents multi model task
* */
for (int i = 0; i < roi_num; ++i) {
hbDNNInferV2(&task_handle, output_tensors[i].data(),
input_tensors[i].data(), dnn_handle);
}
// submit multi tasks
hbUCPSchedParam infer_ctrl_param;
HB_UCP_INITIALIZE_SCHED_PARAM(&infer_ctrl_param);
hbUCPSubmitTask(task_handle, &infer_ctrl_param);
// wait task done
hbUCPWaitTaskDone(task_handle, 0);
// Step6: do postprocess with output data for every task
// Find the max score and corresponding label
for (auto roi_idx = 0; roi_idx < roi_num; roi_idx++) {
auto result = reinterpret_cast<float *>(output_tensors[roi_idx][0].sysMem.virAddr);
float max_score = 0.0;
int label = -1;
for (auto i = 0; i < 1000; i++) {
float score = result[i];
if (score > max_score) {
label = i;
max_score = score;
}
}
std::cout << "label: " << label << std::endl;
}
// Step7: release resources
// release task handle
hbUCPReleaseTask(task_handle);
// free input mem
for (auto &mem : image_mems) {
hbUCPFree(&mem);
}
for (auto &mem : roi_mems) {
hbUCPFree(&mem);
}
// free output mem
for (auto &tensors : output_tensors) {
for (auto &tensor : tensors) {
hbUCPFree(&(tensor.sysMem));
}
}
// release model
hbDNNRelease(packed_dnn_handle);
return 0;
}
#define ALIGN(value, alignment) (((value) + ((alignment)-1)) & ~((alignment)-1))
#define ALIGN_32(value) ALIGN(value, 32)
int prepare_image_tensor(const std::vector<hbUCPSysMem> &image_mem, int input_h,
int input_w, hbDNNHandle_t dnn_handle,
std::vector<hbDNNTensor> &input_tensor) {
// y and uv tensor
for (int i = 0; i < 2; i++) {
hbDNNGetInputTensorProperties(&input_tensor[i].properties, dnn_handle, i);
input_tensor[i].sysMem = image_mem[i];
/** Tips:
* roi model should modify input valid shape to input image shape.
* here the struct of y/uv shape is NHWC
* */
input_tensor[i].properties.validShape.dimensionSize[1] = input_h;
input_tensor[i].properties.validShape.dimensionSize[2] = input_w;
if (i == 1) {
// uv input
input_tensor[i].properties.validShape.dimensionSize[1] /= 2;
input_tensor[i].properties.validShape.dimensionSize[2] /= 2;
}
/** Tips:
* For input tensor, stride should be set according to real padding
* of the user's data. And 32 bytes alignment is the requirement of y/uv
* */
input_tensor[i].properties.stride[1] =
ALIGN_32(input_tensor[i].properties.stride[2] *
input_tensor[i].properties.validShape.dimensionSize[2]);
input_tensor[i].properties.stride[0] =
input_tensor[i].properties.stride[1] *
input_tensor[i].properties.validShape.dimensionSize[1];
}
return 0;
}
int prepare_roi_tensor(const hbUCPSysMem *roi_mem, hbDNNHandle_t dnn_handle,
int32_t roi_tensor_id, hbDNNTensor *roi_tensor) {
hbDNNGetInputTensorProperties(&roi_tensor->properties, dnn_handle, roi_tensor_id);
roi_tensor->sysMem = *roi_mem;
return 0;
}
int prepare_output_tensor(hbDNNHandle_t dnn_handle,
std::vector<hbDNNTensor> &output) {
for (size_t i = 0; i < output.size(); i++) {
hbDNNGetOutputTensorProperties(&output[i].properties, dnn_handle, i);
hbUCPMallocCached(&output[i].sysMem, output[i].properties.alignedByteSize, 0);
}
return 0;
}
int read_binary_file(std::string file_path, char **bin, int *length) {
std::ifstream ifs(file_path, std::ios::in | std::ios::binary);
ifs.seekg(0, std::ios::end);
*length = ifs.tellg();
ifs.seekg(0, std::ios::beg);
*bin = new char[sizeof(char) * (*length)];
ifs.read(*bin, *length);
ifs.close();
return 0;
}
/** You can define read_image_2_other_type to prepare your data **/
int read_image_2_nv12(std::string &y_path, std::string &uv_path,
std::vector<hbUCPSysMem> &image_mem, int &input_h,
int &input_w) {
// copy y data
auto w_stride = ALIGN_32(input_w);
int32_t y_mem_size = input_h * w_stride;
hbUCPMallocCached(&image_mem[0], y_mem_size, 0);
uint8_t *y_data_dst = reinterpret_cast<uint8_t *>(image_mem[0].virAddr);
int32_t y_data_length = 0;
char *y_data = nullptr;
read_binary_file(y_path, &y_data, &y_data_length);
memcpy(reinterpret_cast<char *>(image_mem[0].virAddr), y_data, y_mem_size);
// copy uv data
int32_t uv_height = input_h / 2;
int32_t uv_width = input_w / 2;
int32_t uv_mem_size = uv_height * w_stride;
hbUCPMallocCached(&image_mem[1], uv_mem_size, 0);
int32_t uv_data_length = 0;
char *uv_data = nullptr;
read_binary_file(uv_path, &uv_data, &uv_data_length);
memcpy(reinterpret_cast<char *>(image_mem[1].virAddr), uv_data, uv_mem_size);
// make sure cahced mem data is flushed to DDR before inference
hbUCPMemFlush(&image_mem[0], HB_SYS_MEM_CACHE_CLEAN);
hbUCPMemFlush(&image_mem[1], HB_SYS_MEM_CACHE_CLEAN);
free(y_data);
free(uv_data);
return 0;
}
int prepare_roi_mem(const std::vector<Roi> &rois,
std::vector<hbUCPSysMem> &roi_mem) {
auto roi_size = rois.size();
roi_mem.resize(roi_size);
for (auto i = 0; i < roi_size; ++i) {
int32_t mem_size = 4 * sizeof(int32_t);
hbUCPMallocCached(&roi_mem[i], mem_size, 0);
int32_t *roi_data = reinterpret_cast<int32_t *>(roi_mem[i].virAddr);
// The order of filling in the corner points of roi tensor is left, top, right, bottom
roi_data[0] = rois[i].left;
roi_data[1] = rois[i].top;
roi_data[2] = rois[i].right;
roi_data[3] = rois[i].bottom;
// make sure cahced mem data is flushed to DDR before inference
hbUCPMemFlush(&roi_mem[i], HB_SYS_MEM_CACHE_CLEAN);
}
return 0;
}
进行交叉编译前,您需先准备好CMakeLists.txt和示例文件。CMakeLists.txt内容如下,因示例不包含数据前处理等操作,所以依赖较少,此处主要是对GCC的编译参数、依赖的头文件及动态库的配置。
其中dnn板端推理库,而hbucp
用于对tensor做操作。
# CMakeLists.txt
cmake_minimum_required(VERSION 3.0)
project(sample)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wl,-unresolved-symbols=ignore-in-shared-libs")
message(STATUS "Build type: ${CMAKE_BUILD_TYPE}")
set(CMAKE_CXX_FLAGS_DEBUG "-g -O0")
set(CMAKE_C_FLAGS_DEBUG "-g -O0")
set(CMAKE_CXX_FLAGS_RELEASE " -O3 ")
set(CMAKE_C_FLAGS_RELEASE " -O3 ")
set(CMAKE_BUILD_TYPE ${build_type})
set(DEPS_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/deps_aarch64)
include_directories(${DEPS_ROOT}/ucp/include)
link_directories(${DEPS_ROOT}/ucp/lib)
add_executable(run_sample src/main.cc)
target_link_libraries(run_sample dnn hbucp)
编译的环境目录结构如下:
.
├── CMakeLists.txt
├── deps_aarch64
│ └── ucp
│ ├── include
│ └── lib
└── src
└── main.cc
当示例文件及CMakeLists.txt准备好之后即可执行编译。编译命令的示例如下:
请注意,编译脚本中要将CC和CXX配置为交叉编译GCC和G++的实际路径。
#!/usr/bin/env bash
# Note,please configure according to the actual path
export CC=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
export CXX=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++
rm -rf arm_build; mkdir arm_build; cd arm_build
cmake ..; make -j8
cd ..
编译完成后,即可生成可上板运行的 run_sample
二进制程序。至此,板端示例构建流程已全部完成。
当可执行程序编译完成后,需要对模型的输入进行准备。为降低实践的操作和依赖配置成本,这里通过python对数据做处理,当然,您也可以根据数据处理逻辑在板端示例中通过C++实现(需确保数据处理逻辑相同),示例如下:
import os
import cv2
import PIL
import numpy as np
from PIL import Image
image_path = "./ILSVRC2012_val_00000001.JPEG"
def resize_transformer(image_data: np.array, short_size: int):
image = Image.fromarray(image_data.astype('uint8'), 'RGB')
# Specify width, height
w, h = image.size
if (w <= h and w == short_size) or (h <= w and h == short_size):
return np.array(image)
# I.e., the width of the image is the short side
if w < h:
resize_size = (short_size, int(short_size * h / w))
# I.e., the height of the image is the short side
else:
resize_size = (int(short_size * w / h), short_size)
# Resize the image
data = np.array(image.resize(resize_size, Image.BILINEAR))
return data
def center_crop_transformer(image_data: np.array, crop_size: int):
image = Image.fromarray(image_data.astype('uint8'), 'RGB')
image_width, image_height = image.size
crop_height, crop_width = (crop_size, crop_size)
crop_top = int(round((image_height - crop_height) / 2.))
crop_left = int(round((image_width - crop_width) / 2.))
image_data = image.crop((crop_left,
crop_top,
crop_left + crop_width,
crop_top + crop_height))
return np.array(image_data).astype(np.float32)
def rgb_to_nv12(image_data: np.array):
r = image_data[:, :, 0]
g = image_data[:, :, 1]
b = image_data[:, :, 2]
y = (0.299 * r + 0.587 * g + 0.114 * b)
u = (-0.169 * r - 0.331 * g + 0.5 * b + 128)[::2, ::2]
v = (0.5 * r - 0.419 * g - 0.081 * b + 128)[::2, ::2]
uv = np.zeros(shape=(u.shape[0], u.shape[1] * 2))
for i in range(0, u.shape[0]):
for j in range(0, u.shape[1]):
uv[i, 2 * j] = u[i, j]
uv[i, 2 * j + 1] = v[i, j]
y = y.astype(np.uint8)
uv = uv.astype(np.uint8)
return y, uv
if __name__ == '__main__':
# load the image with PIL method
pil_image_data = PIL.Image.open(image_path).convert('RGB')
image_data = np.array(pil_image_data).astype(np.uint8)
# Resize the image
image_data = resize_transformer(image_data, 256)
# Crop the image
image_data = center_crop_transformer(image_data, 224)
# Covert format from RGB to nv12
y, uv = rgb_to_nv12(image_data)
y.tofile("ILSVRC2012_val_00000001_y.bin")
uv.tofile("ILSVRC2012_val_00000001_uv.bin")
完成模型输入数据准备,正确生成用于板端示例执行推理的binary格式的输入文件后,还需要确保您现在已准备好如下内容:
S100开发板,用于实际执行板端程序运行。
一个可用于板端推理的模型(*.hbm),即 生成板端模型 的产出物。
板端程序(main.cc文件及交叉编译生成板端可执行程序),即 构建板端示例 的产出物。
板端程序依赖库,为了降低部署成本,您可以直接使用OE包 samples/ucp_tutorial/deps_aarch64/ucp/lib
文件夹中的内容。
准备好之后,我们将模型文件(*.hbm)、输入数据(*.bin文件)、板端程序及依赖库整合到一起,参考目录结构如下:
horizon
├── ILSVRC2012_val_00000001_y.bin
├── ILSVRC2012_val_00000001_uv.bin
├── lib
├── resnet18_224x224_nv12_resizer.hbm
└── run_sample
将此整合的文件夹整体拷贝至板端环境下,参考如下指令:
scp -r horizon/ root@{board_ip}:/map/
最后,对LD_LIBRARY_PATH进行配置并运行程序即可,如下所示:
horizon@hobot:/map/horizon# export LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH
horizon@hobot:/map/horizon# ./run_sample
......
label: 65
可以看到,log中打印的 label: 65
正是对应ImageNet数据集中ILSVRC2012_val_00000001图片的label,即分类结果正确。
至此,以Resizer输入ResNet18模型的全流程部署实践就结束了。