Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization over last two dims for tilize/untilize with padding #17790

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

nardoTT
Copy link
Contributor

@nardoTT nardoTT commented Feb 10, 2025

Ticket

Link to Github Issue #15136

Problem description

Limited parallelization for tilize/untilize along one dimension, which affects the performance

What's changed

This PR adds parallelization along last two dims for tilize/untilize with padding. For large tensors, the operations use more cores and improve the perf by around 30 times for the tests added. The average device samples/s is more than 3.5 times larger for models like vgg11 and vgg16. It is also improved for some Bert tiny tests.

Checklist

Copy link
Contributor

@ntarafdar ntarafdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is amazing. Great job @nardoTT

@nardoTT nardoTT requested a review from uaydonat as a code owner February 10, 2025 16:33
@@ -126,10 +126,10 @@ def test_perf_device_bare_metal_vgg(batch_size, model_name):
margin = 0.03

if model_name == "ttnn_vgg11":
expected_perf = 39 if is_grayskull() else 114
expected_perf = 150 if is_grayskull() else 288
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly 4X this is amazing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants