ArcFace RKNN Conversion Failure: Output is Always Zero on RV1106

2025-04-18 8:18

Hi everyone,

I'm seeking help with an issue I'm encountering when deploying a modified ArcFace model on an RV1106 device (Luckfox PICO MAX) using the RKNN toolkit. Despite seemingly successful conversion steps, the final RKNN model consistently outputs a buffer filled with zeros during inference.

Here's a summary of the process and what I've tried:

1. Model Preparation (PyTorch):

I started with an ArcFace implementation using a MobileFaceNet backbone.
Modification: I removed the final L2 normalization layer from the PyTorch model structure before exporting. The goal is to perform normalization manually after inference. Remove L2 by comment line 85, file: https://github.com/bubbliiiing/arcface- ... arcface.py.

2. ONNX Export & Testing:

The modified PyTorch model was exported to ONNX format. I initially used opset_version=11 and later tried opset_version=19 (as recommended by the RKNN toolkit warning).
ONNX Test (Successful): I tested the resulting ONNX model (arcface_no_norm.onnx / arcface_no_norm_opset19.onnx) using onnxruntime in Python (on Google Colab).
Preprocessing included resizing (112x112), converting to RGB, normalizing pixels to [-1, 1] (equivalent to (pixel - 127.5) / 127.5), and transposing to NCHW format.
The output embedding was explicitly L2 normalized after inference.
Result: The ONNX model worked correctly, producing distinct, non-zero embeddings for different faces and achieving reasonable accuracy (around 86.6% in my simulated test). This confirms the base ONNX model is functional.

Result:

Code: Select all

Loading model...
Scanning folders...
Creating initial slots (with duplication check)...
Simulating smart locker system...
100%|██████████| 2000/2000 [01:43<00:00, 19.40it/s]
RESULT:
True Positives : 1732
False Negatives : 268
Accuracy: 86.60%

3. ONNX to RKNN Conversion:

I used rknn-toolkit2 (version 1.6.0+81f21f4d) to convert the working ONNX model to RKNN format.
Conversion Parameters:
target_platform='rv1106'
mean_values=[[127.5, 127.5, 127.5]], std_values=[[127.5, 127.5, 127.5]] (Matching the [-1, 1] normalization)
Used the --dataset flag, providing a .txt file with paths to ~100 representative face images for quantization calibration.
Specified NCHW input during load_onnx (input_size_list=[[1,3,112,112]]).
Conversion Result: The conversion process completed without errors, although warnings about the ONNX opset version (when using opset 11) were noted. The toolkit correctly identified that input/output dtypes were changed from float32 to int8.

Code: Select all

(RKNN-Toolkit2) root@b6cd80ed720e:/mnt/arcface# python onnx2rknn.py
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: If you don't need to crop the model, don't set 'inputs'/'input_size_list'/'outputs'!
Loading : 100%|████████████████████████████████████████████████| 136/136 [00:00<00:00, 25003.30it/s]
done
--> Building model
GraphPreparing : 100%|████████████████████████████████████████████| 96/96 [00:00<00:00, 2553.64it/s]
Quantizating : 100%|████████████████████████████████████████████████| 96/96 [00:01<00:00, 59.21it/s]
W build: The default input dtype of 'input' is changed from 'float32' to 'int8' in rknn model for performance!
                       Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of 'output_no_norm' is changed from 'float32' to 'int8' in rknn model for performance!
                      Please take care of this change when deploy rknn model with Runtime API!
done
--> Export RKNN model
done
(RKNN-Toolkit2) root@b6cd80ed720e:/mnt/arcface#

4. RKNN Inference (C++ on RV1106):

Model Query: Using rknn_query on the generated .rknn model confirmed:
Input: INT8, NHWC format, zp=0, scale=0.00784314 (approx 1/127.5)
Output: INT8, zp and scale vary slightly depending on conversion run (e.g., zp=-8/scale=0.0143 or zp=-12/scale=0.0114)
Note: The toolkit correctly transformed the layout from NCHW (ONNX) to NHWC (RKNN).
C++ Preprocessing: My C++ code prepares the input buffer according to the queried specs:
Reads image using OpenCV (imread).
Resizes to 112x112.
Converts BGR to RGB using cv::cvtColor.
Applies correct INT8 quantization based on queried input zp=0 and scale=1/127.5 (using round(pixel - 127.5) and clamping to [-128, 127]).
Arranges data in NHWC format in the input buffer.
Inference Execution: Called rknn_run(ctx, nullptr).
Problem: rknn_run completes without returning an error code (< 0), BUT the contents of the output buffer (obtained via rknn_output_get) are always zeros, regardless of the input image. The raw output sum is consistently 0.

Troubleshooting Performed:

Confirmed the ONNX model works independently.

Confirmed the C++ code applies the correct input quantization (matching rknn_query).

Confirmed the C++ code converts input to RGB.

Confirmed the C++ code provides data in the NHWC layout expected by the RKNN model.

Confirmed a dataset was used during RKNN quantization.

Confirmed the target_platform setting.

Tried re-exporting ONNX with a higher opset (e.g., 19) and reconverting. The zero-output issue persists.

Verified the dataset.txt file paths seem correct and images are valid (though I plan to test with a minimal dataset next).

Code: Select all

[root@luckfox root]# ./opencv-mobile-test arcface.rknn 1.jpg 2.jpg
opencv-mobile HW JPG encoder with rk mpp
Model path: arcface19.rknn
Image 1 path: 1.jpg
Image 2 path: 2.jpg
Initializing ArcFace model: arcface19.rknn
Model Input Spec: [1, 112, 112, 3], Type=2, Format=1 (NHWC expected)
Model Output Spec: n_dims=2, dims=[1, 128], Type=2, n_elems=128, ZP=-12, Scale=0.011420
Creating input/output buffers...
Model initialized successfully.
Feature vector size (output elements): 128
Required input buffer size: 37632 bytes
Allocated memory for float vectors.

--- Processing image 1: 1.jpg ---
DEBUG: preprocess (Image 1): Input buffer start (first 10 bytes): -9 -25 -22 -5 -21 -18 -7 -23 -22 -10
DEBUG: preprocess (Image 1): Input buffer int8 SUM = -2094904
Image 1 preprocessed.
Running inference for image 1...
Inference 1 finished.
DEBUG: dequantize (Image 1): Raw output buffer start (first 10 bytes): 0 0 0 0 0 0 0 0 0 0
DEBUG: dequantize (Image 1): Using Scale=0.01141967, ZP=-12
DEBUG: dequantize (Image 1): Raw output buffer int8 SUM = 0
DEBUG: dequantize (Image 1): Dequantized float output start (first 10): 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361
Output 1 dequantized.
DEBUG: Main (Image 1): Vector norm BEFORE normalize: 1.550386
Output 1 normalized.
DEBUG: Main (Image 1): Vector norm AFTER normalize: 1.000000
DEBUG: Main (Image 1): Final vector start (first 10): 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883

--- Processing image 2: 2.jpg ---
DEBUG: preprocess (Image 2): Input buffer start (first 10 bytes): -119 -119 -119 -119 -119 -119 -119 -119 -119 -119
DEBUG: preprocess (Image 2): Input buffer int8 SUM = -2889469
Image 2 preprocessed.
Running inference for image 2...
Inference 2 finished.
DEBUG: dequantize (Image 2): Raw output buffer start (first 10 bytes): 0 0 0 0 0 0 0 0 0 0
DEBUG: dequantize (Image 2): Using Scale=0.01141967, ZP=-12
DEBUG: dequantize (Image 2): Raw output buffer int8 SUM = 0
DEBUG: dequantize (Image 2): Dequantized float output start (first 10): 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361 0.1370361
Output 2 dequantized.
DEBUG: Main (Image 2): Vector norm BEFORE normalize: 1.550386
Output 2 normalized.
DEBUG: Main (Image 2): Vector norm AFTER normalize: 1.000000
DEBUG: Main (Image 2): Final vector start (first 10): 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883 0.0883883

Calculating similarity...
----------------------------------------
Cosine Similarity: 1.000000
----------------------------------------

Executing cleanup...
Freed vector1_float
Freed vector2_float
Releasing model resources...
Input mem buffer released.
Output mem buffer released.
Input attrs released.
Output attrs released.
RKNN context released.
Model resources released.
Cleanup finished. Returning 0
[root@luckfox root]#

Environment:

Ubuntu 22.04 (Docker)

RKNN 1.6

Google Collab (For removed ReduceL2)

Repository based:

https://github.com/tiwater/arcface

https://github.com/bubbliiiing/arcface-pytorch

Request:
Has anyone encountered a similar issue where rknn_run produces zero output despite seemingly correct conversion and input preparation? Are there known issues with rknn-toolkit2 v1.6.0 on LUCKFOX PICO MAX, particularly with MobileFaceNet/ArcFace architectures or specific ONNX operators from certain opsets?

Any suggestions for further debugging steps or potential causes would be greatly appreciated. Could there be an issue with how the toolkit handled the quantization internally despite using a dataset?

Thanks in advance!

2025-04-18 10:08

Hello, porting a model to RKNN is a complex process that requires a solid understanding of the model itself. We do not have the capacity or resources to reproduce your testing environment, even though you have provided detailed information.

In our experience, there are two common reasons why the model outputs all zeros:

The input image format is incorrect (for example, the model expects BGR but receives RGB).

The model conversion process was not properly configured, particularly the rknn.config parameters such as optimization_level and quantized_algorithm.