Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecuTorch runner of Qnn backend can't run the pte model by following the tutorial. #8762

Open
TheBetterSolution opened this issue Feb 27, 2025 · 13 comments
Assignees
Labels
module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ module: user experience Issues related to reducing friction for users partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@TheBetterSolution
Copy link

TheBetterSolution commented Feb 27, 2025

🐛 Describe the bug

I follow the tutorial to build the runner of Qnn backend and run it:
https://pytorch.org/executorch/main/build-run-qualcomm-ai-engine-direct-backend.html

But that the run model step:

adb shell "cd ${DEVICE_DIR} \
           && export LD_LIBRARY_PATH=${DEVICE_DIR} \
           && export ADSP_LIBRARY_PATH=${DEVICE_DIR} \
           && ./qnn_executor_runner --model_path ./dlv3_qnn.pte"

I found the error & warning message:

I 00:00:00.000852 executorch:qnn_executor_runner.cpp:160] Model file ./dl3_qnn_q8.pte is loaded.
I 00:00:00.000883 executorch:qnn_executor_runner.cpp:170] Using method forward
I 00:00:00.000896 executorch:qnn_executor_runner.cpp:217] Setting up planned buffer 0, size 9031680.
[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[ERROR] [Qnn ExecuTorch]:  <E> Stub lib id mismatch: expected (v2.28.2.241116104011_103376), detected (v2.25.17.241017130936_18858)

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[INFO] [Qnn ExecuTorch]: QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?
[ERROR] [Qnn ExecuTorch]: <E> Stub lib id mismatch: expected (v2.28.2.241116104011_103376), detected (v2.25.17.241017130936_18858)

And please note the message:
QnnContextCustomProtocol expected magic number: 0x5678abcd but get: 0x2000000

Thanks.

Versions

Versions
Collecting environment information...
PyTorch version: 2.6.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
Nvidia driver version: 560.94
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+791472d
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pytorch-triton==3.2.0+gitb2684bf3
[pip3] torch==2.6.0+cpu
[pip3] torchao==0.8.0+git11333ba2
[pip3] torchaudio==2.6.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.0.dev20250131+cu124
[pip3] torchvision==0.21.0+cpu
[conda] Could not collect

cc @cccclai @winskuo-quic @shewu-quic @cbilgin @mergennachin @byjlw

@byjlw byjlw assigned byjlw and cccclai and unassigned byjlw Feb 27, 2025
@byjlw byjlw added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Feb 27, 2025
@guangy10 guangy10 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: user experience Issues related to reducing friction for users labels Feb 27, 2025
@github-project-automation github-project-automation bot moved this to To triage in ExecuTorch DevX Feb 27, 2025
@guangy10 guangy10 added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Feb 27, 2025
@cccclai
Copy link
Contributor

cccclai commented Feb 27, 2025

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

@TheBetterSolution
Copy link
Author

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I will check it again, thanks.

@codereba
Copy link

codereba commented Mar 6, 2025

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I ensure I only install (v2.28.2.241116104011_103376) HTP SDK, do you know why detected stub is v2.25.17.241017130936_18858?

How do you generate the .pte file? It's likely due to the .pte file is exported with the v2.25 version

I will check it again, thanks.

I download the qnn sdk (v2.28.0.241029232508_102474), the change sdk directory and copy its android libs accordingly, this error still happen:

[INFO] [Qnn ExecuTorch]: Deserializing processed data using QnnContextCustomProtocol
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[ERROR] [Qnn ExecuTorch]:  <E> Stub lib id mismatch: expected (v2.28.0.241029232508_102474), detected (v2.25.17.241017130936_18858)

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[ERROR] [Qnn ExecuTorch]:  <E> Unable to load Remote symbols 1008

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

And I also checked the model.pte file and libQnn...so file, their versions are: v2.28.0.241029..., that are correct.

I think the reason maybe the libQnn...so are already installed in the OS os the android device, if I run the qnn_executorch_runner, it will load the libQnn...so in OS without loading the libQnn...so in the parent directory of qnn_executorch_runner.

I found the OS native libQnn...so is in the /odm/lib64 path, I think qnn_executorch_runner loaded it, but its version is v2.25.

I don't find the method to ignore it now.

@cccclai
Copy link
Contributor

cccclai commented Mar 6, 2025

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

@codereba
Copy link

codereba commented Mar 6, 2025 via email

@haowhsu-quic
Copy link
Collaborator

Maybe the ODM is using QNN for some feature development. I wonder if LD_LIBRARY_PATH=. ./qnn_executor_runner ... will help?

@shewu-quic
Copy link
Collaborator

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

@codereba
Copy link

codereba commented Mar 7, 2025

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work.
I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file.
Thanks all @shewu-quic @haowhsu-quic

I think at least there is the solution, it can load the so dynamically.

@cccclai
Copy link
Contributor

cccclai commented Mar 7, 2025

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work. I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file. Thanks all @shewu-quic @haowhsu-quic

I think at least there is the solution, it can load the so dynamically.

Definitely not ideal but glad you can get unblocked.

@codereba
Copy link

codereba commented Mar 7, 2025

OS native libQnn...so is in the /odm/lib6

Interesting, what device are you using? @shewu-quic @chunit-quic @haowhsu-quic @winskuo-quic @DannyYuyang-quic do you know if the qnn library will be part of the os?

Yes, maybe. But I think you could set LD_LIBRARY_PATH and ADSP_LIBRARY_PATH to change the library you loaded.

I tried it, but it still not work. I found the apk can exclude the so files manually, but I don't find the method for the pure elf executable file. Thanks all @shewu-quic @haowhsu-quic
I think at least there is the solution, it can load the so dynamically.

Definitely not ideal but glad you can get unblocked.

Right, this is an idea, I not implemented it, I will continue to find the good solution.
Thanks.

@codereba
Copy link

codereba commented Mar 9, 2025

Hi everyone, excuse me, the MCU of the phone is "qualcomm snapdragon 8 elite", but I use SM8650 as the parameter to run the model for testing on device (https://pytorch.org/executorch/main/backends-qualcomm.html#deploying-and-running-on-device)
The process is different for the different MCU, e.g. the so files to upload the phone are different for the different MCU, please refer to:
https://github.com/pytorch/executorch/blob/main/examples/qualcomm/utils.py#L125

So, this reason is the so file are mismatched with the MCU, the executable file wants to load the matched so file of the MCU, but it can't find this so file in the app directory and then load the so file in OS.

After I changed the code of MCU to SM8750 (SM8750 isn't listed in the tutorial, but currently executorch supports it), that works correctly.

Thanks everyone.

@cccclai
Copy link
Contributor

cccclai commented Mar 10, 2025

We'd need to enhance the error message..on the other hand,

SM8750 isn't listed in the tutorial, but currently executorch support it

Which tutorial do you refer to?

@codereba
Copy link

codereba commented Mar 10, 2025

We'd need to enhance the error message..on the other hand,

SM8750 isn't listed in the tutorial, but currently executorch support it

Which tutorial do you refer to?

I referred to this tutorial, please refer to:
https://pytorch.org/executorch/main/backends-qualcomm.html#deploying-and-running-on-device

Image

Code is updated to support more MCU, please refer to:
https://github.com/pytorch/executorch/blob/main/backends/qualcomm/utils/utils.py#L1284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ module: user experience Issues related to reducing friction for users partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: To triage
Development

No branches or pull requests

7 participants