Zero copy may lead to wrong text generation results #145

brevity2021 · 2022-09-30T04:43:35Z

Hi,

I was trying the "zero copy" method in the t5 notebook on a seq2seq transformer model. When I set the clone_tensor to True everything looks fine, just not as much speedup as I expected.

When I tried to set the clone_tensor to False, the text generation gives wrong results (some are repetitive ids). I debugged a bit and found that although the binding inputs are the same as when clone_tensor is True, after run_with_io_binding the results are different. It seems that it could be somehow fixed by not reusing the iobinding, but at some steps, generating a new iobinding, but I really have no clue why.

I'm still playing with it and can post some code snippets later, but wonder if you have encountered something like this (different results when switching between clone_tensor and if you have any suggestions. Thanks!

The text was updated successfully, but these errors were encountered:

pommedeterresautee · 2022-09-30T07:22:08Z

I tested a lot the clone tensor set to False for obvious reason and had no issue at the time. Did you test with ORT 1.11 ?

brevity2021 · 2022-10-03T04:14:09Z

Thanks for the reply! I was testing with ORT 1.12. It might be due to the model implementation(since I was using Pegasus instead of T5, and had to modify the code a bit to make it work). It's quite some code so I need some time to put them together and showcase the problem. I will update this thread.

brevity2021 · 2022-10-04T04:10:37Z

@pommedeterresautee Here are the notebooks to replicate the error. To make things easier I use a t5 model for illustration.

I first run make docker_build from a clean transformer-deploy directory, then run docker run -p 8686:8686 -v $PWD/demo/generative-model:/docker_folder ghcr.io/els-rd/transformer-deploy:latest \ bash -c "cd /docker_folder && jupyter notebook --ip 0.0.0.0 --port 8686 --no-browser --allow-root" to start docker container & notebook server.

This notebook is for exporting the ONNX. I am using fp32 instead of trying to do fp16.
This notebook shows the inference. When the clone_tensor is set to False, the result contains a lot of 0s. When setting the clone_tensor to True, the inference result is the same as pytorch result.

I am using a g5.2xlarge aws instance.

Can you please help take a look?

c-schumacher · 2023-02-24T19:40:34Z

@brevity2021 it may not be relevant anymore since your question was a while ago, but I noticed the same thing when modifying the approach to T5 for another model. Downgrading onnxruntime-gpu to 1.11 fixed the issue for me.

c-schumacher · 2023-02-24T19:58:45Z

Strangely though, even though outputs with the cache are correct after downgrading ORT, that approach is almost 4x slower than using only a decoder without cache support and ~2x slower than the vanilla pytorch implementation. Do you have any idea why that might be @pommedeterresautee? I'm using a similar seq2seq model so not a lot has changed from the T5 build.

brevity2021 · 2023-02-25T05:28:02Z

@c-schumacher This [notebook] mentions "Version 1.11.1 of ONNX Runtime and older have a bug which makes them much slower when most inputs are used by subgraphs of an If node.
Use a version >= 1.12.0 instead."
This might be the reason of your slow speed. Although downgrading to 1.11 might be good for the non-copy to work, it introduces other problems.
For my case, setting clone_tensor to False does not work for ORT >= 1.12.

pommedeterresautee added the bug Something isn't working label Sep 30, 2022

brevity2021 changed the title ~~[Question] Zero copy may lead to wrong text generation results~~ Zero copy may lead to wrong text generation results Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero copy may lead to wrong text generation results #145

Zero copy may lead to wrong text generation results #145

brevity2021 commented Sep 30, 2022 •

edited

Loading

pommedeterresautee commented Sep 30, 2022

brevity2021 commented Oct 3, 2022 •

edited

Loading

brevity2021 commented Oct 4, 2022 •

edited

Loading

c-schumacher commented Feb 24, 2023

c-schumacher commented Feb 24, 2023

brevity2021 commented Feb 25, 2023

Zero copy may lead to wrong text generation results #145

Zero copy may lead to wrong text generation results #145

Comments

brevity2021 commented Sep 30, 2022 • edited Loading

pommedeterresautee commented Sep 30, 2022

brevity2021 commented Oct 3, 2022 • edited Loading

brevity2021 commented Oct 4, 2022 • edited Loading

c-schumacher commented Feb 24, 2023

c-schumacher commented Feb 24, 2023

brevity2021 commented Feb 25, 2023

brevity2021 commented Sep 30, 2022 •

edited

Loading

brevity2021 commented Oct 3, 2022 •

edited

Loading

brevity2021 commented Oct 4, 2022 •

edited

Loading