Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiling execute_multipass issues #2238

Open
rjodinchr opened this issue Jan 22, 2025 · 0 comments · May be fixed by #2239
Open

profiling execute_multipass issues #2238

rjodinchr opened this issue Jan 22, 2025 · 0 comments · May be fixed by #2239

Comments

@rjodinchr
Copy link
Contributor

That test is failing using clvk and various Vulkan drivers (llvmpipe, swiftshader, mesa-based drivers).

I am seeing multiple issues in the test itself:

  1. err = clGetDeviceInfo( device, CL_DEVICE_MAX_WORK_ITEM_SIZES, sizeof( cl_uint ), (size_t*)localThreads, NULL );

    localThreads type is size_t[3]. Thus the size given is not right, and we proper implementation should end up with undefined values in localThreads[1] & localThreads[2] at least. Most probably also localThreads[0] as cl_uint might be smaller than size_t...

  2. each element of localThreads is clamp to its max value, except for localThreads[2]. I'm not sure I understand why, and would also clamp it

  3. We end up with a workgroup using the maximum work-item sizes per dimensions (CL_DEVICE_MAX_WORK_ITEM_SIZES). This can be higher than CL_DEVICE_MAX_WORK_GROUP_SIZE on some devices, thus we would also need to make sure not to get higher than it.

  4. The input and output buffer are declared as cl_uchar[w * h * d * nChannels]. But
    4.1.

    sizeof(cl_float) * w * h * d * nChannels, NULL, &err);

    The output is created with a bigger size (*sizeof(cl_float))
    4.2.
    err = clEnqueueReadBuffer(queue, memobjs[1], CL_TRUE, 0, w*h*d*nChannels*4, outptr, 0, NULL, NULL);

    The output is read with a bigger size (*4)

This is trigger segfault because the host buffer where the output should be stored is smaller than the CL buffer.

rjodinchr added a commit to rjodinchr/OpenCL-CTS that referenced this issue Jan 22, 2025
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the
proper size

- clamp localThreads[2] as for localThreads[0] and localThreads[2]

- clamp all localThreads elements in regard of CL_MAX_WORK_GROUP_SIZE

- fix the size using to create/read the output buffer

Fix KhronosGroup#2238
@rjodinchr rjodinchr linked a pull request Jan 22, 2025 that will close this issue
rjodinchr added a commit to rjodinchr/OpenCL-CTS that referenced this issue Jan 22, 2025
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the
proper size

- clamp all localThreads elements with regard to CL_MAX_WORK_GROUP_SIZE

- fix the size using to create/read the output buffer

Fix KhronosGroup#2238
rjodinchr added a commit to rjodinchr/OpenCL-CTS that referenced this issue Jan 22, 2025
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the
proper size

- clamp all localThreads elements with regard to CL_MAX_WORK_GROUP_SIZE

- fix the size using to create/read the output buffer

Fix KhronosGroup#2238
rjodinchr added a commit to rjodinchr/OpenCL-CTS that referenced this issue Jan 22, 2025
- fix clGetDeviceInfo(CL_DEVICE_MAX_WORK_ITEM_SIZES) by using the
proper size

- clamp all localThreads elements with regard to CL_MAX_WORK_GROUP_SIZE

- fix the size using to create/read the output buffer

Fix KhronosGroup#2238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants