Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCLomatic-test] Block store test #725

Open
wants to merge 15 commits into
base: SYCLomatic
Choose a base branch
from

Conversation

abhilash1910
Copy link
Contributor

@abhilash1910 abhilash1910 commented Jun 5, 2024

In line with block store PR (oneapi-src/SYCLomatic#1819), this tests only store api.
WIP. Linked to composite PR : #680

cc @danhoeflinger @mmichel11 @yihanwg

@abhilash1910 abhilash1910 requested a review from a team as a code owner June 5, 2024 09:55
@abhilash1910 abhilash1910 marked this pull request as draft June 5, 2024 09:55
@abhilash1910 abhilash1910 marked this pull request as ready for review June 6, 2024 04:19
thread_data[i] = dacc_read[global_index * 4 + i];
}

auto *d_r =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why we are storing to buffer here and then storing the thread_data to buffer_out? To me, it looks like we are not verifying the group_store operation since buffer_out is being passed to the verification routine which is just the value of thread_data .

item.get_local_id(2); // Each thread_data has 4 elements
#pragma unroll
for (int i = 0; i < 4; ++i) {
dacc_write[global_index * 4 + i] = thread_data[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to above about the purpose of this and how we are verifying the store routines at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the recent changes, we are now doing group_store(tmp).store(item, d_w, thread_data); where d_w is referring to buffer_out. This part seems correct, but my comment about these lines still apply.

It seems like the contents of buffer_out after the group_store operation are being overwritten by the writes in this loop. Can we just remove this entire loop? I am not sure why it is here unless I misunderstand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes modified to prevent the overwrite, could you please take a look . Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks to resolve the issue

@abhilash1910 abhilash1910 marked this pull request as draft August 2, 2024 13:14
@abhilash1910 abhilash1910 marked this pull request as ready for review August 2, 2024 13:24
Comment on lines +48 to +58
int expected[512];
int num_threads = 128;
int items_per_thread = 4;
uint32_t sg_sz_val = *sg_sz;
for (int i = 0; i < num_threads; ++i) {
for (int j = 0; j < items_per_thread; ++j) {
expected[items_per_thread * i + j] =
(i / sg_sz_val) * sg_sz_val * items_per_thread + sg_sz_val * j +
i % sg_sz_val;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are setting up expected and then never using it. Am I missing something?

}

for (int i = 0; i < 512; ++i) {
if (ptr[i] != i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the actual check is just against the index i.
That also seems to be the source data for the input buffer, does this pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the test is passing since we are overwriting the results of the subgroup store operation in test_store_subgroup_striped_standalone

sg_sz_acc[0] = item.get_sub_group().get_local_linear_range();
}
dpct::group::store_subgroup_striped<4, int>(item, d_w, thread_data);
// reapply global mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are overwriting the result of the sub-group store here from lines 173 to 179. If we remove this the test fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... seems like this should not be here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants