模型性能分析

该参数用于测试模型性能。该模式下，您无需输入数据，程序根据模型信息自动构造输入tensor，tensor数据为随机数。

程序默认单线程运行200帧数据，当指定perf_time参数时，frame_count参数失效，程序会执行指定时间后退出。

输出模型运行的latency、以及帧率信息。程序每200帧打印一次性能信息： latency的最大、最小、平均值，不足200帧程序运行结束打印一次。

程序最后输出running相关数据，包括：程序线程数、帧数、模型推理总时间，模型推理平均latency，帧率信息。

支持范围

支持hbm模型。

注意

若在 perf 时，不指定输入，工具内部会随机构造，但如果模型本身对输入强依赖，随机构造的输入可能会使程序core dump。

使用方法

Usage:
hrt_model_exec [Option...] [Parameter]

[Option]                      [instruction]
---------------------------------------------------------------------------------------------------------------
-h --help                     Display this information
-v --version                  Display this version

[Option]                      [Parameter]
---------------------------------------------------------------------------------------------------------------
--model_file                  [string]: Model file paths, separate by comma, each represents one model file path.
--model_name                  [string]: Model name.
                                        When model_file has one more model and Subcommand is infer or perf,
                                        "model_name" must be specified!
--core_id                     [string]: core id, 0 for any core, 1 for core 0, 2 for core 1 and etc, default is 0.
                                        Please confirm the number of bpu cores on the board before setting up.
                                        When you need to specify multiple cores, separate them with commas, such as "1,2".
--input_file                  [string]: Input file paths, separate by comma, each represents one input.
                                        The extension of files should be one of [jpg, JPG, jpeg, JPEG, png, PNG, bin, txt]
                                        bin for binary such as image data, nv12 or yuv444 etc.
                                        txt for plain data such as image info.
--frame_count                 [int]   : frame count for run loop, default 200, valid when perf_time is 0 in perf mode; default 1 for infer mode.
--dump_intermediate           [string]: dump intermediate layer input and output. The default is 0. Subcommand must be infer.
--perf_time                   [int]   : minute, perf time for run loop, default 0.
                                        Subcommand must be perf.
--thread_num                  [int]   : thread num for run loop, thread_num range:[1,64], Subcommand must be perf.
--profile_path                [string]: profile log and csv files path, set to get detail information
                                        of model execution.
--input_img_properties        [string]: Specify the color space of the image type input. Each image needs to specify the color space, separated by commas.
                                        The supported color spaces are [Y, UV].
--input_valid_shape           [string]: Complete the validshape of the model input, allowing only the dynamic part to change. Provide two ways to set:
                                        1. This only needs to be set when the validShape of the model input is dynamic. 2. Set for all inputs.
                                        Different inputs are separated by semicolons, and different dimensions are separated by commas.
                                        For example: --input_valid_shape="1,376,376,1;1,188,188,2".
--input_stride                [string]: Complete the stride of the model input, allowing only the dynamic part to change. Provide two ways to set:
                                        1. This only needs to be set when the stride of model input is dynamic. 2. Set for all inputs.
                                        Different inputs are separated by semicolons, and different dimensions are separated by commas.
                                        For example: --input_stride="144384,384,1,1;72192,384,2,1".

[Examples]
---------------------------------------------------------------------------------------------------------------
hrt_model_exec perf
   --model_file
   --model_name
   --core_id
   --input_file
   --input_img_properties
   --input_valid_shape
   --input_stride
   --frame_count
   --profile_path
   --perf_time
   --thread_num

参数说明

参数	参数类型	参数说明	关联参数
`-h, --help`	无。	显示帮助信息。	无。
`-v, --version`	无。	查看工具的 dnn 预测库版本号。	无。
`perf`	无。	该参数用于执行模型性能分析，获取性能分析结果。	该参数需要与 `model_file` 一起使用，用于获取模型的详细信息。
`model_file`	string	模型文件路径，多个路径可通过逗号分隔。	无。
`model_name`	string	指定某个模型的名称。	无。
`core_id`	string	指定运行核。0：任意核，1：core0，2：core1，以此类推。默认为 0。当指定多个核运行时，用英文字符的逗号隔开，如 `"1,2"`。	无。
`input_file`	string	模型输入信息。输入后缀必须为`PNG`/`JPG`/`JPEG`/`png`/`jpg`/`jpeg`/`bin`/`txt`中的一种。每个输入之间需要用英文字符的逗号隔开`,`，如：`xxx.jpg,input.txt`。	无。
`input_img_properties`	string	模型图像输入的色彩空间信息，参数范围[`Y`, `UV`]。	该参数需要与`input_file` 一起使用，`input_file`中每一个图片类型输入都需要指定一个`Y`/`UV`类型，每个输入色彩空间之间需要用英文字符的逗号隔开`,`，如：`Y,UV`。
`input_valid_shape`	string	模型动态 `validShape` 输入信息。若模型输入属性 `validShape` 中含有 `-1`，则需要将 `-1` 的部分进行补全，多个 `validShape` 间通过英文分号间隔。如：`--input_valid_shape="1,376,376,1;1,188,188,2"`。	无。
`input_stride`	string	模型动态 `stride` 输入信息。若模型输入属性 `stride` 中含有 `-1`，则需要将 `-1` 的部分进行补全，多个 `stride` 间通过英文分号间隔。如：`--input_stride="50176,224,1,1;25088,224,2,1"`。	无。
`frame_count`	int	执行模型运行帧数。子命令为`infer`时，默认为 1。子命令为`perf`时，默认为 200。	当子命令为`perf`时，没有设置`perf_time`时生效。
`dump_intermediate`	string	dump模型每一层输入和输出，参数范围[0, 3]。默认为 0。 `dump_intermediate=0` 时，默认dump功能关闭。 `dump_intermediate=1` 时，模型中每一层节点输入数据输出数据以 `bin` 方式保存，其中节点输入输出为 `stride` 数据。 `dump_intermediate=2` 时，模型中每一层节点输入数据和输出数据以 `bin` 和 `txt` 两种方式保存，其中节点输入输出为 `stride` 数据。 `dump_intermediate=3` 时，型中每一层节点输入数据和输出数据以 `bin` 和 `txt` 两种方式保存，其中节点输入输出为 `valid` 数据。	无。
`perf_time`	int	设置`perf`运行时间，单位：分钟。默认为 0。	无。
`thread_num`	int	设置程序运行线程数(并行度)，数值可以表示最多有多少个任务在并行处理，参数范围[1, 64], 默认为 1。测试延时，数值需要设置为1，没有资源抢占发生，延时测试更准确。测试吞吐，建议设置 > 3 * N(BPU核心个数)，调整线程数使BPU利用率尽量高，吞吐测试更准确。	无。
`profile_path`	string	统计工具日志产生路径，运行产生profiler.log和profiler.csv，分析op耗时和调度耗时。关于profiler.log和profiler.csv文件详细说明，请参考profile_path说明。一般设置 `--profile_path="."` 即可，代表在当前目录下生成日志文件。	无。

profile_path说明

设置profile_path参数且工具正常运行后会产生profiler.log和profiler.csv文件，文件中包括如下参数：

ucp_version：UCP及HBRT版本号。
perf_result：记录perf结果。

参数	说明
`FPS`	每秒处理的帧数。
`average_latency`	指定模型中某个模型平均一帧运行所花费的时间。

running_condition：运行环境信息。

参数	说明
`core_id`	程序运行设置的bpu核。
`frame_count`	程序运行的总帧数。
`model_name`	评测模型的名字。
`run_time`	程序运行时间。
`thread_num`	程序运行的线程数。

model_latency：模型节点耗时统计。

参数	说明
`Node-pad`	模型输入padding耗时。
`Node-NodeIdx-NodeType-NodeName`	模型节点耗时信息。注：NodeIdx为模型节点拓扑排序的序号，NodeType为具体的节点类型，如Dequantize，NodeName为具体的节点名称。

processor_latency：模型处理器耗时统计。

参数	说明
`BPU_inference_time_cost`	每帧推理BPU处理器耗时。
`CPU_inference_time_cost`	每帧推理CPU处理器耗时。

task_latency：模型任务耗时统计。

参数	说明
`TaskRunningTime`	任务实际运行耗时，耗时时间包括UCP框架耗时。

使用示例

hrt_model_exec perf --model_file=xxx.hbm

../aarch64/bin/hrt_model_exec perf --model_file=resnet50_224x224_nv12.hbm --input_stride=57344,256,1,1;28672,256,2,1 --frame_count=200 --thread_num=16
Load model to DDR cost 1390.88ms.
Frame count: 200,  Thread Average: 3.423510 ms,  thread max latency: 84.261002 ms,  thread min latency: 1.024000 ms,  FPS: 2345.875977

Running condition:
  Thread number is: 16
  Frame count   is: 200
  Program run time: 85.469 ms
Perf result:
  Frame totally latency is: 684.702 ms
  Average    latency    is: 3.424 ms
  Frame      rate       is: 2345.876 FPS

注解

性能测试一次只支持运行一个模型，当 model_file 存在多个模型时，请设置 model_name 参数进行指定。

hrt_model_exec perf --model_file=xxx.hbm,xxx.hbm --model_name=xxx

页面目录

PTQ转换工具

hb_compile工具

PTQ转换步骤

PTQ转换示例

常见问题及故障处理

附录

开发指南

深入探索

API参考

QAT

模型导出

Horizon算子

常见问题及常见故障

模型推理开发

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

UCP通用API介绍

数据结构

功能接口

UCP性能分析工具

常见问题及错误码

模型部署原理及流程

模型部署实践指导实例

HMCT API Reference

工具链算子支持约束列表

算子支持列表

算子BPU约束列表

社区优质文章

模型性能分析

支持范围

使用方法

参数说明

profile_path说明

使用示例

hb_compile工具

QAT

模型导出

Horizon算子

模型推理API手册

数据结构

功能接口

模型推理工具介绍

hrt_model_exec工具介绍

hbm_infer工具介绍

数据结构

功能接口

算子支持列表

算子BPU约束列表

#模型性能分析

#支持范围

#使用方法

#参数说明

#profile_path说明

#使用示例

模型性能分析

支持范围

使用方法

参数说明

profile_path说明

使用示例