Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgpu] support Pad operator #23141

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

[webgpu] support Pad operator #23141

wants to merge 11 commits into from

Conversation

xhcao
Copy link
Contributor

@xhcao xhcao commented Dec 18, 2024

Description

Motivation and Context

@xhcao
Copy link
Contributor Author

xhcao commented Dec 18, 2024

@jchen10 @hujiajie PTAL, thanks

@xhcao xhcao marked this pull request as ready for review December 18, 2024 11:29
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Dec 19, 2024
@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@xhcao
Copy link
Contributor Author

xhcao commented Dec 20, 2024

@fs-eire @guschmue Please help to trigger the bots again. Last version failed on Mac OS, but could compile correctly on Windows. I had changed the code, but not ensured it worked correctly on Mac OS. The compiling error was shown as below.
/Users/runner/work/1/s/onnxruntime/core/providers/webgpu/tensor/pad.h:16:54: error: member initializer 'Program' does not name a non-static data member or base class PadProgram(const Mode mode, bool dim_value_zero) : Program{"Pad"}, mode_{mode}, dim_value_zero_{dim_value_zero} {}

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

guschmue
guschmue previously approved these changes Jan 14, 2025
@fs-eire
Copy link
Contributor

fs-eire commented Feb 11, 2025

The build break in Web CI pipeline is caused by this change:

2025-02-08T07:34:42.7945646Z /mnt/vss/_work/1/s/onnxruntime/core/providers/webgpu/tensor/pad.cc:116:116: error: implicit conversion loses integer precision: 'int64_t' (aka 'long long') to 'size_type' (aka 'unsigned long') [-Werror,-Wshorten-64-to-32]
2025-02-08T07:34:42.7948642Z   116 |     int64_t upper_pad = (*p_pads)[static_cast<int64_t>(i) + dimension_count] + (*p_slices)[static_cast<int64_t>(i) + dimension_count];
2025-02-08T07:34:42.7950719Z       |                                                                                ~           ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
2025-02-08T07:34:42.7951830Z /mnt/vss/_work/1/s/onnxruntime/core/providers/webgpu/tensor/pad.cc:116:59: error: implicit conversion loses integer precision: 'int64_t' (aka 'long long') to 'size_type' (aka 'unsigned long') [-Werror,-Wshorten-64-to-32]
2025-02-08T07:34:42.7989831Z   116 |     int64_t upper_pad = (*p_pads)[static_cast<int64_t>(i) + dimension_count] + (*p_slices)[static_cast<int64_t>(i) + dimension_count];
2025-02-08T07:34:42.7990546Z       |                         ~         ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~

The pipeline is using parallel building so you may need to scroll up a little bit more to find the error message.

in WebAssembly build, size_t is 4 bytes instead of 8 bytes. this is why the warning occurred

@xhcao
Copy link
Contributor Author

xhcao commented Feb 12, 2025

@fs-eire take a look again, thanks.

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

guschmue
guschmue previously approved these changes Feb 12, 2025
@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@fs-eire
Copy link
Contributor

fs-eire commented Feb 14, 2025

It seems that onnx_backend_test_series fails (SIGSEGV) on macOS and the error stably reproduces after retry.

Maybe need to verify if this can be reproducible locally on a macOS device.

@guschmue
Copy link
Contributor

guschmue commented Feb 14, 2025

macos pipeline is failing with:
'onnx_backend_test_series.py']' died with <Signals.SIGSEGV: 11>.

seems to be ok on main as far I can tell, so maybe something in this PR triggers it.

@xhcao
Copy link
Contributor Author

xhcao commented Feb 25, 2025

@fs-eire @guschmue The bots error log had been removed, could you help trigger the bots again? I want to re-produce the issue on local mac, but I want to the options when running the 'onnx_backend_test_series.py'. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants