-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero copy may lead to wrong text generation results #145
Comments
I tested a lot the clone tensor set to False for obvious reason and had no issue at the time. Did you test with ORT 1.11 ? |
Thanks for the reply! I was testing with ORT 1.12. It might be due to the model implementation(since I was using Pegasus instead of T5, and had to modify the code a bit to make it work). It's quite some code so I need some time to put them together and showcase the problem. I will update this thread. |
@pommedeterresautee Here are the notebooks to replicate the error. To make things easier I use a t5 model for illustration. I first run This notebook is for exporting the ONNX. I am using fp32 instead of trying to do fp16. I am using a Can you please help take a look? |
@brevity2021 it may not be relevant anymore since your question was a while ago, but I noticed the same thing when modifying the approach to T5 for another model. Downgrading onnxruntime-gpu to 1.11 fixed the issue for me. |
Strangely though, even though outputs with the cache are correct after downgrading ORT, that approach is almost 4x slower than using only a decoder without cache support and ~2x slower than the vanilla pytorch implementation. Do you have any idea why that might be @pommedeterresautee? I'm using a similar seq2seq model so not a lot has changed from the T5 build. |
@c-schumacher This [notebook] mentions "Version 1.11.1 of ONNX Runtime and older have a bug which makes them much slower when most inputs are used by subgraphs of an If node. |
Hi,
I was trying the "zero copy" method in the t5 notebook on a seq2seq transformer model. When I set the
clone_tensor
to True everything looks fine, just not as much speedup as I expected.When I tried to set the
clone_tensor
to False, the text generation gives wrong results (some are repetitive ids). I debugged a bit and found that although the binding inputs are the same as whenclone_tensor
is True, afterrun_with_io_binding
the results are different. It seems that it could be somehow fixed by not reusing the iobinding, but at some steps, generating a new iobinding, but I really have no clue why.I'm still playing with it and can post some code snippets later, but wonder if you have encountered something like this (different results when switching between
clone_tensor
and if you have any suggestions. Thanks!The text was updated successfully, but these errors were encountered: