Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster packing #1172

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

bwintermann
Copy link

When building ResNet50 i stumbled upon the issue that due to very large weight tensors, the HLS Codegen step, which internally uses the array2hexstring function, was taking a very long time to execute. For tensors of roughly 2 Mio. entries it took in the area of ~30s. When doing it for many layers, this step alone would take ~15min per build, making development in later steps difficult due to low iteration speeds.

To speed this process up, I firstly focused on the BINARY datatype case and rewrote the function in C, integrating it via Python's ctypes. I also added tests that check the results of randomized input tensors to the original Python implementation.

I tested two shapes for the input tensors, one with 64 as the innermost, and one with 2048 as the innermost dimension, both with overall roughly 2 Mio. elements. For both I executed the function 5 times.
For the 64 one I got an overall runtime of 237.41s (47.482s per sample) in Python and 2.856s overall runtime (0.571s per sample) for the C function for an estimated speedup of ~83x. For the 2048 one, I got an overall runtime of 232.201s (46.44s per sample) in Python and an overall runtime of 0.115s (0.023s per sample) in C, yielding an estimated speedup of ~2019x, presumably due to lower function call overhead.

In the future I would like to expand this to all DataTypes and try to speed up the C implementation a bit more as well, but for now I don't think that further speedup is strictly necessary.

@bwintermann
Copy link
Author

I actually just ran into an issue when executing the whole build flow, getting an OSError on imports from other modules. Currently looking into it.

Signed-off-by: bwintermann <[email protected]>
@bwintermann
Copy link
Author

Fixed the bug, which was apparently caused by the multithreaded IPGen step, for which every thread reloaded the module and tried to access the same library. It is now protected by a singleton function to make sure it's only loaded once.
Running this on my ResNet50 model has reduced the code generation time from around ~16min to ~1min 45s, 99% of it spent on a call that uses INT8, which is still using the Python implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant