Luckfox Pico Ultra W推理rknn模型报错
Posted: 2025-01-30 16:38
开发板型号:Luckfox Pico Ultra W
烧录系统:buildroot
问题表现:通过C API在板上推理模型时,在中间层抛出错误:
E RKNN: failed to submit!, op id: 16, op name: Conv:/uv_denoise/output_layer/Conv, task start: 0, task number: 247, run task counter: 56, int status: 0, please try updating to the latest version of the toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn)
同时通过dmesg | grep -i rknpu可以看到异常:
[39992.804035] RKNPU: failed to wait job, task counter: 56, flags: 0x1, ret = 0, elapsed time: 6495436us
[39992.924001] RKNPU: job timeout, flags: 0x0:
[39992.924026] RKNPU: core 0 irq status: 0x0, raw status: 0x0, require mask: 0xc00, task counter: 0x38, elapsed time: 6615425us
[39993.043983] RKNPU: soft reset
(此处查询命令从rknn SDK手册中得到,希望可以了解luckfox-pico如何通过命令行查询开发板的rknpu驱动以及部署rknn-server)
问题发生过程描述
我根据rknn-toolkit2文档以及luckfox wiki,将一个图像增强模型通过rknn-toolkit2 V1.6.0进行转换,转换路线为pytorch2.4.1-onnx1.17.0-rknn-toolkit2 V1.6.0。onnx到rknn转换参考luckfox示例,成功转换模型,模型在转换脚本中通过rknn.build后,还通过rknn.init_runtime和rknn.inference确认其能够输出正确结果。
随后,基于luckfox示例,我构建了使用C API在板上运行的简易C++代码(代码后附),主要包括模型初始化,输入张量初始化以及模型释放。经过测试,模型能够正确初始化和释放,但是进行推理会发生报错,报错内容见上。
补充信息
1.根据C API,我得知开发板使用0.92版本驱动以及1.6.0的rknnmrt环境,因此在上述问题调试过程中,我在luckfox docker中安装rknn-toolkit2 V1.6.0进行模型转换,以避免版本不兼容引发的问题(尽管通过2.3.0转换的例程模型在板上不会发生任何报错)
2.我尝试使用最新的rknn-toolkit2 V2.3.0进行模型转换,使用此模型会产生同样的报错。此外,V2.3.0提供了测试代码生成工具,通过运行此测试代码,同样会产生相同的报错。
3.代码运行期间,初始化打印信息如下:
input tensors:
index=0, name=input, n_dims=4, dims=[1, 400, 600, 3], n_elems=720000, size=720000, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003906
output tensors:
index=0, name=output, n_dims=4, dims=[1, 400, 600, 3], n_elems=720000, size=720000, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=1.000000
internl_mem_size: 17280000
sdk api version: 1.6.0 (9a7b5d24c@2023-12-13T17:33:10) , driver version: 0.9.2
input_attrs[0].size_with_stride=729600
output mem [0] = 3840000
model is NHWC input fmt
model input height=400, width=600, channel=3
其中internal mem和sdk version为基于例程添加的内容。
4.模型量化转换采用默认的normal和w8a8进行,如有需要可以提供onnx和转换后的rknn模型
补充代码
1.convert.py:
import sys
import time
import cv2
import numpy as np
from rknn.api import RKNN
def parse_arg():
if len(sys.argv) < 5:
print("Usage: python3 {} [onnx_model_path] [dataset_path] [output_rknn_path] [model_type]".format(sys.argv[0]));
exit(1)
model_path = sys.argv[1]
dataset_path= sys.argv[2]
output_path = sys.argv[3]
model_type = sys.argv[4]
return model_path, dataset_path, output_path,model_type
if __name__ == '__main__':
model_path, dataset_path, output_path, model_type= parse_arg()
# Create RKNN object
rknn = RKNN(verbose=True)
# Pre-process config
print('--> Config model')
#rknn.config(target_platform='rv1106', quant_img_RGB2BGR=True,optimization_level=0)
rknn.config(mean_values=[0, 0, 0], std_values=[256, 256, 256],target_platform='RV1106', quant_img_RGB2BGR=True,quantized_algorithm='normal',optimization_level=0)
print('done')
# Load model
print('--> Loading model')
ret = rknn.load_onnx(model=model_path,inputs=['input'],input_size_list=[[1,3,400,600]],outputs=['output'])
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=True, dataset=dataset_path)
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
print("--> Init runtime environment")
#ret = rknn.init_runtime(target='RV1106')
ret = rknn.init_runtime(target=None)
if ret != 0:
print("Init runtime environment failed")
exit(ret)
print("done")
src_img = cv2.imread('low00741.png')
color_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB) # hwc rgb
color_img =color_img[np.newaxis,:] #add axis 1
color_img = color_img.astype(np.float32)
color_img = color_img.transpose((0, 3, 1, 2))
mean = np.mean(color_img, axis=(0, 2, 3))
std = np.std(color_img, axis=(0, 2, 3))
print(f"均值: {mean}")
print(f"方差: {std}")
# 2. Inference
print("--> Running model")
# 获取开始时间
start = time.perf_counter()
pred = rknn.inference(inputs=[color_img], data_format='nchw') #nhwc --> nchw
# 获取结束时间
end = time.perf_counter()
# 计算运行时间
runTime = end - start
runTime_ms = runTime * 1000
# 输出运行时间
print("运行时间:", runTime_ms, "毫秒")
# 压缩维度并且转换类型
np.save("pred_raw.npy", pred)
pred = np.squeeze(pred).astype('uint8')
print(pred.shape)
pred = pred.transpose((1, 2, 0))
print(pred.shape)
pred = cv2.cvtColor(pred, cv2.COLOR_RGB2BGR)
cv2.imwrite("low00741enhance.png", pred)
# Export rknn model
print('--> Export rknn model')
ret = rknn.export_rknn(output_path)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
ret = rknn.codegen(output_path='./rknn_app_demo',inputs=['./low00741.png'], overwrite=True)
# Release
rknn.release()
2.部署C++代码(部分)
C++代码中,模型初始化和释放函数和示例相同,此处不附
main代码(略去命令行参数读取):
clock_t start_time;
clock_t end_time;
//Model Input
int yuv_width = 600;
int yuv_height = 400;
int ret;
rknn_app_context_t app_yuvmodel_ctx;
memset(&app_yuvmodel_ctx, 0, sizeof(rknn_app_context_t));
//Init Model
init_yuvmodel(model_path, &app_yuvmodel_ctx);
//Init Opencv-mobile
cv::VideoCapture cap;
cv::Mat bgr(yuv_height, yuv_height, CV_8UC3);
cv::Mat yuv_input(yuv_height, yuv_width, CV_8UC3, app_yuvmodel_ctx.input_mems[0]->virt_addr);
cap.set(cv::CAP_PROP_FRAME_WIDTH, yuv_height);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, yuv_height);
cap.open(0);
//Start inference
int test_times = 100;
printf("yuvmodel start inferencing 100times...\n");
start_time = clock();
while(test_times--)
{
printf("yuvmodel inferencing %d times\n", 100 - test_times);
//opencv get photo
cap >> bgr;
cv::resize(bgr, yuv_input, cv::Size(yuv_width,yuv_height), 0, 0, cv::INTER_LINEAR);
ret = rknn_run(app_yuvmodel_ctx.rknn_ctx, nullptr);
if (ret < 0) {
printf("rknn_run fail! ret=%d\n", ret);
return -1;
}
if(test_times % 10 == 0)
{
end_time = clock();
printf("yuvmodel inferencing %d times, time=%f\n", 100 - test_times, (double)(end_time - start_time) / CLOCKS_PER_SEC);
cv::Mat output_image(400, 600, CV_8UC3, app_yuvmodel_ctx.output_mems[0]->virt_addr);
// 保存图像
std::string filename = "output_" + std::to_string(100 - test_times) + ".png";
cv::imwrite(filename, output_image);
}
}
release_yuvmodel(&app_yuvmodel_ctx);
烧录系统:buildroot
问题表现:通过C API在板上推理模型时,在中间层抛出错误:
E RKNN: failed to submit!, op id: 16, op name: Conv:/uv_denoise/output_layer/Conv, task start: 0, task number: 247, run task counter: 56, int status: 0, please try updating to the latest version of the toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn)
同时通过dmesg | grep -i rknpu可以看到异常:
[39992.804035] RKNPU: failed to wait job, task counter: 56, flags: 0x1, ret = 0, elapsed time: 6495436us
[39992.924001] RKNPU: job timeout, flags: 0x0:
[39992.924026] RKNPU: core 0 irq status: 0x0, raw status: 0x0, require mask: 0xc00, task counter: 0x38, elapsed time: 6615425us
[39993.043983] RKNPU: soft reset
(此处查询命令从rknn SDK手册中得到,希望可以了解luckfox-pico如何通过命令行查询开发板的rknpu驱动以及部署rknn-server)
问题发生过程描述
我根据rknn-toolkit2文档以及luckfox wiki,将一个图像增强模型通过rknn-toolkit2 V1.6.0进行转换,转换路线为pytorch2.4.1-onnx1.17.0-rknn-toolkit2 V1.6.0。onnx到rknn转换参考luckfox示例,成功转换模型,模型在转换脚本中通过rknn.build后,还通过rknn.init_runtime和rknn.inference确认其能够输出正确结果。
随后,基于luckfox示例,我构建了使用C API在板上运行的简易C++代码(代码后附),主要包括模型初始化,输入张量初始化以及模型释放。经过测试,模型能够正确初始化和释放,但是进行推理会发生报错,报错内容见上。
补充信息
1.根据C API,我得知开发板使用0.92版本驱动以及1.6.0的rknnmrt环境,因此在上述问题调试过程中,我在luckfox docker中安装rknn-toolkit2 V1.6.0进行模型转换,以避免版本不兼容引发的问题(尽管通过2.3.0转换的例程模型在板上不会发生任何报错)
2.我尝试使用最新的rknn-toolkit2 V2.3.0进行模型转换,使用此模型会产生同样的报错。此外,V2.3.0提供了测试代码生成工具,通过运行此测试代码,同样会产生相同的报错。
3.代码运行期间,初始化打印信息如下:
input tensors:
index=0, name=input, n_dims=4, dims=[1, 400, 600, 3], n_elems=720000, size=720000, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003906
output tensors:
index=0, name=output, n_dims=4, dims=[1, 400, 600, 3], n_elems=720000, size=720000, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=1.000000
internl_mem_size: 17280000
sdk api version: 1.6.0 (9a7b5d24c@2023-12-13T17:33:10) , driver version: 0.9.2
input_attrs[0].size_with_stride=729600
output mem [0] = 3840000
model is NHWC input fmt
model input height=400, width=600, channel=3
其中internal mem和sdk version为基于例程添加的内容。
4.模型量化转换采用默认的normal和w8a8进行,如有需要可以提供onnx和转换后的rknn模型
补充代码
1.convert.py:
import sys
import time
import cv2
import numpy as np
from rknn.api import RKNN
def parse_arg():
if len(sys.argv) < 5:
print("Usage: python3 {} [onnx_model_path] [dataset_path] [output_rknn_path] [model_type]".format(sys.argv[0]));
exit(1)
model_path = sys.argv[1]
dataset_path= sys.argv[2]
output_path = sys.argv[3]
model_type = sys.argv[4]
return model_path, dataset_path, output_path,model_type
if __name__ == '__main__':
model_path, dataset_path, output_path, model_type= parse_arg()
# Create RKNN object
rknn = RKNN(verbose=True)
# Pre-process config
print('--> Config model')
#rknn.config(target_platform='rv1106', quant_img_RGB2BGR=True,optimization_level=0)
rknn.config(mean_values=[0, 0, 0], std_values=[256, 256, 256],target_platform='RV1106', quant_img_RGB2BGR=True,quantized_algorithm='normal',optimization_level=0)
print('done')
# Load model
print('--> Loading model')
ret = rknn.load_onnx(model=model_path,inputs=['input'],input_size_list=[[1,3,400,600]],outputs=['output'])
if ret != 0:
print('Load model failed!')
exit(ret)
print('done')
# Build model
print('--> Building model')
ret = rknn.build(do_quantization=True, dataset=dataset_path)
if ret != 0:
print('Build model failed!')
exit(ret)
print('done')
print("--> Init runtime environment")
#ret = rknn.init_runtime(target='RV1106')
ret = rknn.init_runtime(target=None)
if ret != 0:
print("Init runtime environment failed")
exit(ret)
print("done")
src_img = cv2.imread('low00741.png')
color_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB) # hwc rgb
color_img =color_img[np.newaxis,:] #add axis 1
color_img = color_img.astype(np.float32)
color_img = color_img.transpose((0, 3, 1, 2))
mean = np.mean(color_img, axis=(0, 2, 3))
std = np.std(color_img, axis=(0, 2, 3))
print(f"均值: {mean}")
print(f"方差: {std}")
# 2. Inference
print("--> Running model")
# 获取开始时间
start = time.perf_counter()
pred = rknn.inference(inputs=[color_img], data_format='nchw') #nhwc --> nchw
# 获取结束时间
end = time.perf_counter()
# 计算运行时间
runTime = end - start
runTime_ms = runTime * 1000
# 输出运行时间
print("运行时间:", runTime_ms, "毫秒")
# 压缩维度并且转换类型
np.save("pred_raw.npy", pred)
pred = np.squeeze(pred).astype('uint8')
print(pred.shape)
pred = pred.transpose((1, 2, 0))
print(pred.shape)
pred = cv2.cvtColor(pred, cv2.COLOR_RGB2BGR)
cv2.imwrite("low00741enhance.png", pred)
# Export rknn model
print('--> Export rknn model')
ret = rknn.export_rknn(output_path)
if ret != 0:
print('Export rknn model failed!')
exit(ret)
print('done')
ret = rknn.codegen(output_path='./rknn_app_demo',inputs=['./low00741.png'], overwrite=True)
# Release
rknn.release()
2.部署C++代码(部分)
C++代码中,模型初始化和释放函数和示例相同,此处不附
main代码(略去命令行参数读取):
clock_t start_time;
clock_t end_time;
//Model Input
int yuv_width = 600;
int yuv_height = 400;
int ret;
rknn_app_context_t app_yuvmodel_ctx;
memset(&app_yuvmodel_ctx, 0, sizeof(rknn_app_context_t));
//Init Model
init_yuvmodel(model_path, &app_yuvmodel_ctx);
//Init Opencv-mobile
cv::VideoCapture cap;
cv::Mat bgr(yuv_height, yuv_height, CV_8UC3);
cv::Mat yuv_input(yuv_height, yuv_width, CV_8UC3, app_yuvmodel_ctx.input_mems[0]->virt_addr);
cap.set(cv::CAP_PROP_FRAME_WIDTH, yuv_height);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, yuv_height);
cap.open(0);
//Start inference
int test_times = 100;
printf("yuvmodel start inferencing 100times...\n");
start_time = clock();
while(test_times--)
{
printf("yuvmodel inferencing %d times\n", 100 - test_times);
//opencv get photo
cap >> bgr;
cv::resize(bgr, yuv_input, cv::Size(yuv_width,yuv_height), 0, 0, cv::INTER_LINEAR);
ret = rknn_run(app_yuvmodel_ctx.rknn_ctx, nullptr);
if (ret < 0) {
printf("rknn_run fail! ret=%d\n", ret);
return -1;
}
if(test_times % 10 == 0)
{
end_time = clock();
printf("yuvmodel inferencing %d times, time=%f\n", 100 - test_times, (double)(end_time - start_time) / CLOCKS_PER_SEC);
cv::Mat output_image(400, 600, CV_8UC3, app_yuvmodel_ctx.output_mems[0]->virt_addr);
// 保存图像
std::string filename = "output_" + std::to_string(100 - test_times) + ".png";
cv::imwrite(filename, output_image);
}
}
release_yuvmodel(&app_yuvmodel_ctx);