ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

TheBetterSolution · 2025-02-27T07:58:28Z

🐛 Describe the bug

I follow the tutorial to build the runner of Qnn backend and run it:
https://pytorch.org/executorch/main/build-run-qualcomm-ai-engine-direct-backend.html

But that the run model step:

adb shell "cd ${DEVICE_DIR} \
           && export LD_LIBRARY_PATH=${DEVICE_DIR} \
           && export ADSP_LIBRARY_PATH=${DEVICE_DIR} \
           && ./qnn_executor_runner --model_path ./dlv3_qnn.pte"

I found the error & warning message:

I 00:00:00.000852 executorch:qnn_executor_runner.cpp:160] Model file ./dl3_qnn_q8.pte is loaded.
I 00:00:00.000883 executorch:qnn_executor_runner.cpp:170] Using method forward
I 00:00:00.000896 executorch:qnn_executor_runner.cpp:217] Setting up planned buffer 0, size 9031680.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[ERROR] [Qnn ExecuTorch]:  <E> Stub lib id mismatch: expected (v2.28.2.241116104011_103376), detected (v2.25.17.241017130936_18858)

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?
[ERROR] [Qnn ExecuTorch]: <E> Stub lib id mismatch: expected (v2.28.2.241116104011_103376), detected (v2.25.17.241017130936_18858)

And please note the message:
QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000

Thanks.

Versions

Versions
Collecting environment information...
PyTorch version: 2.6.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
Nvidia driver version: 560.94
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+791472d
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pytorch-triton==3.2.0+gitb2684bf3
[pip3] torch==2.6.0+cpu
[pip3] torchao==0.8.0+git11333ba2
[pip3] torchaudio==2.6.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.0.dev20250131+cu124
[pip3] torchvision==0.21.0+cpu
[conda] Could not collect

cc @cccclai @winskuo-quic @shewu-quic @cbilgin @mergennachin @byjlw

The text was updated successfully, but these errors were encountered:

cccclai · 2025-02-27T19:29:14Z

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

TheBetterSolution · 2025-02-28T09:58:52Z

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I will check it again, thanks.

codereba · 2025-03-06T14:01:31Z

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I will check it again, thanks.

I download the qnn sdk (v2.28.0.241029232508_102474), the change sdk directory and copy its android libs accordingly, this error still happen:

[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[ERROR] [Qnn ExecuTorch]:  <E> Stub lib id mismatch: expected (v2.28.0.241029232508_102474), detected (v2.25.17.241017130936_18858)

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

And I also checked the model.pte file and libQnn...so file, their versions are: v2.28.0.241029..., that are correct.

I think the reason maybe the libQnn...so are already installed in the OS os the android device, if I run the qnn_executorch_runner, it will load the libQnn...so in OS without loading the libQnn...so in the parent directory of qnn_executorch_runner.

I found the OS native libQnn...so is in the /odm/lib64 path, I think qnn_executorch_runner loaded it, but its version is v2.25.

I don't find the method to ignore it now.

cccclai · 2025-03-06T18:59:43Z

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

codereba · 2025-03-06T23:24:44Z

Hi @cccclai, The phone is one plus, os version is 15. I don't find how to exclude so file on os or set the so file of app to highest priority, I will continue to search it. Thanks.

haowhsu-quic · 2025-03-07T02:01:33Z

Maybe the ODM is using QNN for some feature development. I wonder if LD_LIBRARY_PATH=. ./qnn_executor_runner ... will help?

shewu-quic · 2025-03-07T02:03:33Z

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

codereba · 2025-03-07T03:11:37Z

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work.
I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file.
Thanks all @shewu-quic @haowhsu-quic

I think at least there is the solution, it can load the so dynamically.

cccclai · 2025-03-07T03:26:05Z

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work. I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file. Thanks all @shewu-quic @haowhsu-quic

I think at least there is the solution, it can load the so dynamically.

Definitely not ideal but glad you can get unblocked.

codereba · 2025-03-07T03:31:09Z

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work. I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file. Thanks all @shewu-quic @haowhsu-quic
I think at least there is the solution, it can load the so dynamically.

Definitely not ideal but glad you can get unblocked.

Right, this is an idea, I not implemented it, I will continue to find the good solution.
Thanks.

codereba · 2025-03-09T06:14:04Z

Hi everyone, excuse me, the MCU of the phone is "qualcomm snapdragon 8 elite", but I use SM8650 as the parameter to run the model for testing on device (https://pytorch.org/executorch/main/backends-qualcomm.html#deploying-and-running-on-device)
The process is different for the different MCU, e.g. the so files to upload the phone are different for the different MCU, please refer to:
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/utils.py#L125

So, this reason is the so file are mismatched with the MCU, the executable file wants to load the matched so file of the MCU, but it can't find this so file in the app directory and then load the so file in OS.

After I changed the code of MCU to SM8750 (SM8750 isn't listed in the tutorial, but currently executorch supports it), that works correctly.

Thanks everyone.

cccclai · 2025-03-10T02:37:29Z

We'd need to enhance the error message..on the other hand,

SM8750 isn't listed in the tutorial, but currently executorch support it

Which tutorial do you refer to?

codereba · 2025-03-10T02:41:13Z

We'd need to enhance the error message..on the other hand,

SM8750 isn't listed in the tutorial, but currently executorch support it

Which tutorial do you refer to?

I referred to this tutorial, please refer to:
https://pytorch.org/executorch/main/backends-qualcomm.html#deploying-and-running-on-device

Code is updated to support more MCU, please refer to:
https://github.com/pytorch/executorch/blob/main/backends/qualcomm/utils/utils.py#L1284

byjlw assigned byjlw and cccclai and unassigned byjlw Feb 27, 2025

byjlw added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Feb 27, 2025

guangy10 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: user experience Issues related to reducing friction for users labels Feb 27, 2025

github-project-automation bot added this to ExecuTorch DevX Feb 27, 2025

github-project-automation bot moved this to To triage in ExecuTorch DevX Feb 27, 2025

guangy10 added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

TheBetterSolution commented Feb 27, 2025 •

edited by pytorch-bot bot

Loading

cccclai commented Feb 27, 2025 •

edited

Loading

TheBetterSolution commented Feb 28, 2025

codereba commented Mar 6, 2025 •

edited

Loading

cccclai commented Mar 6, 2025

codereba commented Mar 6, 2025 via email •

edited

Loading

haowhsu-quic commented Mar 7, 2025

shewu-quic commented Mar 7, 2025

codereba commented Mar 7, 2025 •

edited

Loading

cccclai commented Mar 7, 2025

codereba commented Mar 7, 2025 •

edited

Loading

codereba commented Mar 9, 2025 •

edited

Loading

cccclai commented Mar 10, 2025

codereba commented Mar 10, 2025 •

edited

Loading

ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

Comments

TheBetterSolution commented Feb 27, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

cccclai commented Feb 27, 2025 • edited Loading

TheBetterSolution commented Feb 28, 2025

codereba commented Mar 6, 2025 • edited Loading

cccclai commented Mar 6, 2025

codereba commented Mar 6, 2025 via email • edited Loading

haowhsu-quic commented Mar 7, 2025

shewu-quic commented Mar 7, 2025

codereba commented Mar 7, 2025 • edited Loading

cccclai commented Mar 7, 2025

codereba commented Mar 7, 2025 • edited Loading

codereba commented Mar 9, 2025 • edited Loading

cccclai commented Mar 10, 2025

codereba commented Mar 10, 2025 • edited Loading

TheBetterSolution commented Feb 27, 2025 •

edited by pytorch-bot bot

Loading

cccclai commented Feb 27, 2025 •

edited

Loading

codereba commented Mar 6, 2025 •

edited

Loading

codereba commented Mar 6, 2025 via email •

edited

Loading

codereba commented Mar 7, 2025 •

edited

Loading

codereba commented Mar 7, 2025 •

edited

Loading

codereba commented Mar 9, 2025 •

edited

Loading

codereba commented Mar 10, 2025 •

edited

Loading