-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid mask implementation #359
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I know we currently don't have a test for this encoder, but would be good to at least test this aspect of it.
Thank you for the feedback. I have tested cases to cover the encoder functionality with modified codes, specifically focusing on the aspect you mentioned. Before talking about the test results, let me tell you about a fix that were added: Test Env
ResultHere are the results of my tests (each result only extracts the first 100 elements):
The results above show that the results of the modified version match the results of the baseline. ReasonAs I mentioned, the T5 baseline model uses results from the tokenizer as inputs. So, I think this issue is caused by the difference in the way AttentionMask is implemented between Tokenizer and the current version. SummaryThe tests run successfully and validate the expected behavior. However, I will leave the GitHub address where all the code I used for testing is located, as it is possible that my mistakes in converting and testing the model may cause errors in the results. So, please let me know if there are any additional improvements. |
@KyuHwan00 A good way to test this would to provide T5 embeddings into the model using this logic vs the old logic and comparing the resulting images - I would expect them to be significantly different. Could you provide these two images to represent an "end to end" test, as well as steps to reproduce it, perhaps via the CLI? Here is a good example of a PR with this kind of end to end test: #357 Also worth mentioning that the current implementation is pinned to a specific version of T5 that was exported to optimize its computation for Apple Silicon, located here: https://huggingface.co/argmaxinc/coreml-stable-diffusion-3-medium/tree/main/TextEncoderT5.mlmodelc. These optimizations could be a source of the differences you are noticing. You may want to compare the outputs from this model as well to the asset you've exported here https://github.com/KyuHwan00/TestT5Encoder/blob/main/t5export.py to the baseline, all three should match for backwards compatibility. |
@ZachNagengast Thanks for the your feedback. I'll get back with the appropriate test cases ASAP. |
@ZachNagengast @alejandro-isaza Thanks for your patience first, I report on the results for the appropriate test cases.
As you can see from the generated images shown above, you can see that the modified version does not produce an appropriate output for the prompt. Perhaps this difference is made because the code is a specific version of T5 exported to optimize the calculation of Apple Silicon, as @ZachNagengast mentioned. Sorry for missing out on this. Finally, thanks for your feedback. |
Thanks for running these, since the two models are different, perhaps you can adjust your converter code to conform to the pre-exported model. I think the logic makes sense as-is but lmk if you think otherwise:
|
Fix masking implementation at TextEncoderT5.
Issue
Solution
token == padToken ? maskToken : padToken
=>token == padToken ? padToken : maskToken