Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero copy may lead to wrong text generation results #145

Open
brevity2021 opened this issue Sep 30, 2022 · 6 comments
Open

Zero copy may lead to wrong text generation results #145

brevity2021 opened this issue Sep 30, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@brevity2021
Copy link

brevity2021 commented Sep 30, 2022

Hi,

I was trying the "zero copy" method in the t5 notebook on a seq2seq transformer model. When I set the clone_tensor to True everything looks fine, just not as much speedup as I expected.

When I tried to set the clone_tensor to False, the text generation gives wrong results (some are repetitive ids). I debugged a bit and found that although the binding inputs are the same as when clone_tensor is True, after run_with_io_binding the results are different. It seems that it could be somehow fixed by not reusing the iobinding, but at some steps, generating a new iobinding, but I really have no clue why.

I'm still playing with it and can post some code snippets later, but wonder if you have encountered something like this (different results when switching between clone_tensor and if you have any suggestions. Thanks!

@pommedeterresautee
Copy link
Member

I tested a lot the clone tensor set to False for obvious reason and had no issue at the time. Did you test with ORT 1.11 ?

@pommedeterresautee pommedeterresautee added the bug Something isn't working label Sep 30, 2022
@brevity2021
Copy link
Author

brevity2021 commented Oct 3, 2022

Thanks for the reply! I was testing with ORT 1.12. It might be due to the model implementation(since I was using Pegasus instead of T5, and had to modify the code a bit to make it work). It's quite some code so I need some time to put them together and showcase the problem. I will update this thread.

@brevity2021
Copy link
Author

brevity2021 commented Oct 4, 2022

@pommedeterresautee Here are the notebooks to replicate the error. To make things easier I use a t5 model for illustration.

I first run make docker_build from a clean transformer-deploy directory, then run docker run -p 8686:8686 -v $PWD/demo/generative-model:/docker_folder ghcr.io/els-rd/transformer-deploy:latest \ bash -c "cd /docker_folder && jupyter notebook --ip 0.0.0.0 --port 8686 --no-browser --allow-root" to start docker container & notebook server.

This notebook is for exporting the ONNX. I am using fp32 instead of trying to do fp16.
This notebook shows the inference. When the clone_tensor is set to False, the result contains a lot of 0s. When setting the clone_tensor to True, the inference result is the same as pytorch result.

I am using a g5.2xlarge aws instance.

Can you please help take a look?

@brevity2021 brevity2021 changed the title [Question] Zero copy may lead to wrong text generation results Zero copy may lead to wrong text generation results Oct 4, 2022
@c-schumacher
Copy link

@brevity2021 it may not be relevant anymore since your question was a while ago, but I noticed the same thing when modifying the approach to T5 for another model. Downgrading onnxruntime-gpu to 1.11 fixed the issue for me.

@c-schumacher
Copy link

Strangely though, even though outputs with the cache are correct after downgrading ORT, that approach is almost 4x slower than using only a decoder without cache support and ~2x slower than the vanilla pytorch implementation. Do you have any idea why that might be @pommedeterresautee? I'm using a similar seq2seq model so not a lot has changed from the T5 build.

@brevity2021
Copy link
Author

@c-schumacher This [notebook] mentions "Version 1.11.1 of ONNX Runtime and older have a bug which makes them much slower when most inputs are used by subgraphs of an If node.
Use a version >= 1.12.0 instead."
This might be the reason of your slow speed. Although downgrading to 1.11 might be good for the non-copy to work, it introduces other problems.
For my case, setting clone_tensor to False does not work for ORT >= 1.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants