Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Support for native-command #383

Draft
wants to merge 4 commits into
base: sycl
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions clang/lib/Driver/ToolChains/SYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -611,7 +611,13 @@ SYCL::getDeviceLibraries(const Compilation &C, const llvm::Triple &TargetTriple,
{"libsycl-asan-cpu", "internal"},
{"libsycl-asan-dg2", "internal"},
{"libsycl-asan-pvc", "internal"}};
const SYCLDeviceLibsList SYCLDeviceMsanLibs = {{"libsycl-msan", "internal"}};
const SYCLDeviceLibsList SYCLDeviceMsanLibs = {
{"libsycl-msan", "internal"},
{"libsycl-msan-cpu", "internal"},
// Currently, we only provide aot msan libdevice for PVC and CPU.
// For DG2, we just use libsycl-msan as placeholder.
{"libsycl-msan", "internal"},
{"libsycl-msan-pvc", "internal"}};
#endif

const SYCLDeviceLibsList SYCLNativeCpuDeviceLibs = {
Expand Down Expand Up @@ -769,7 +775,7 @@ SYCL::getDeviceLibraries(const Compilation &C, const llvm::Triple &TargetTriple,
if (SanitizeVal == "address")
addSingleLibrary(SYCLDeviceAsanLibs[sanitizer_lib_idx]);
else if (SanitizeVal == "memory")
addLibraries(SYCLDeviceMsanLibs);
addSingleLibrary(SYCLDeviceMsanLibs[sanitizer_lib_idx]);
#endif

if (isNativeCPU)
Expand Down
23 changes: 23 additions & 0 deletions clang/test/Driver/sycl-device-lib-old-model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -355,3 +355,26 @@
// SYCL_DEVICE_MSAN_MACRO-SAME: "USE_SYCL_DEVICE_MSAN"
// SYCL_DEVICE_MSAN_MACRO: llvm-link{{.*}} "-only-needed"
// SYCL_DEVICE_MSAN_MACRO-SAME: "{{.*}}libsycl-msan.bc"

/// ###########################################################################
/// test behavior of linking libsycl-msan-pvc for PVC target AOT compilation when msan flag is applied.
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc --no-offload-new-driver %s --sysroot=%S/Inputs/SYCL \
// RUN: -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// RUN: %clangxx -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend "-device pvc" --no-offload-new-driver %s \
// RUN: --sysroot=%S/Inputs/SYCL -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// RUN: %clangxx -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device pvc" --no-offload-new-driver %s \
// RUN: --sysroot=%S/Inputs/SYCL -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// RUN: %clangxx -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend "-device 12.60.7" --no-offload-new-driver %s \
// RUN: --sysroot=%S/Inputs/SYCL -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// RUN: %clangxx -fsycl -fsycl-targets=spir64_gen -Xs "-device 12.60.7" --no-offload-new-driver %s --sysroot=%S/Inputs/SYCL \
// RUN: -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// SYCL_DEVICE_LIB_MSAN_PVC: llvm-link{{.*}} "{{.*}}libsycl-crt.bc"
// SYCL_DEVICE_LIB_MSAN_PVC-SAME: "{{.*}}libsycl-msan-pvc.bc"


/// ###########################################################################
/// test behavior of linking libsycl-msan-cpu for CPU target AOT compilation when msan flag is applied.
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --no-offload-new-driver %s --sysroot=%S/Inputs/SYCL \
// RUN: -Xarch_device -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_CPU
// SYCL_DEVICE_LIB_MSAN_CPU: llvm-link{{.*}} "{{.*}}libsycl-crt.bc"
// SYCL_DEVICE_LIB_MSAN_CPU-SAME: "{{.*}}libsycl-msan-cpu.bc"
13 changes: 13 additions & 0 deletions clang/test/Driver/sycl-device-lib.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -352,3 +352,16 @@
// SYCL_DEVICE_MSAN_MACRO: "-cc1"
// SYCL_DEVICE_MSAN_MACRO-SAME: "USE_SYCL_DEVICE_MSAN"
// SYCL_DEVICE_MSAN_MACRO: libsycl-msan.new.o

/// test behavior of msan libdevice linking when -fsanitize=memory is available for AOT targets
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc --offload-new-driver %s --sysroot=%S/Inputs/SYCL \
// RUN: -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_PVC
// SYCL_DEVICE_LIB_MSAN_PVC: clang-linker-wrapper{{.*}} "-sycl-device-libraries
// SYCL_DEVICE_LIB_MSAN_PVC-SAME: {{.*}}libsycl-msan-pvc.new.o

/// test behavior of msan libdevice linking when -fsanitize=memory is available for AOT targets
// RUN: %clangxx -fsycl -fsycl-targets=spir64_x86_64 --offload-new-driver %s --sysroot=%S/Inputs/SYCL \
// RUN: -fsanitize=memory -### 2>&1 | FileCheck %s -check-prefix=SYCL_DEVICE_LIB_MSAN_CPU
// SYCL_DEVICE_LIB_MSAN_CPU: clang-linker-wrapper{{.*}} "-sycl-device-libraries
// SYCL_DEVICE_LIB_MSAN_CPU-SAME: {{.*}}libsycl-msan-cpu.new.o

1 change: 1 addition & 0 deletions libclc/libspirv/include/libspirv/spirv.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
#include <libspirv/workitem/get_global_size.h>
#include <libspirv/workitem/get_group_id.h>
#include <libspirv/workitem/get_local_id.h>
#include <libspirv/workitem/get_local_linear_id.h>
#include <libspirv/workitem/get_local_size.h>
#include <libspirv/workitem/get_max_sub_group_size.h>
#include <libspirv/workitem/get_num_groups.h>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

_CLC_DECL _CLC_OVERLOAD size_t __spirv_LocalInvocationIndex();
13 changes: 1 addition & 12 deletions libclc/libspirv/lib/amdgcn-amdhsa/group/collectives.cl
Original file line number Diff line number Diff line change
Expand Up @@ -365,17 +365,6 @@ __CLC_GROUP_COLLECTIVE__DF16(_Z20__spirv_GroupFMulKHRjjDF16_,
#undef __CLC_ADD
#undef __CLC_MUL

long __clc__get_linear_local_id() {
size_t id_x = __spirv_LocalInvocationId_x();
size_t id_y = __spirv_LocalInvocationId_y();
size_t id_z = __spirv_LocalInvocationId_z();
size_t size_x = __spirv_WorkgroupSize_x();
size_t size_y = __spirv_WorkgroupSize_y();
size_t size_z = __spirv_WorkgroupSize_z();
uint sg_size = __spirv_SubgroupMaxSize();
return (id_z * size_y * size_x + id_y * size_x + id_x);
}

long __clc__2d_to_linear_local_id(ulong2 id) {
size_t size_x = __spirv_WorkgroupSize_x();
size_t size_y = __spirv_WorkgroupSize_y();
Expand All @@ -396,7 +385,7 @@ long __clc__3d_to_linear_local_id(ulong3 id) {
return _Z28__spirv_SubgroupShuffleINTELI##TYPE_MANGLED##ET_S0_j( \
x, local_id); \
} \
bool source = (__clc__get_linear_local_id() == local_id); \
bool source = (__spirv_LocalInvocationIndex() == local_id); \
__local TYPE *scratch = __CLC_APPEND(__clc__get_group_scratch_, TYPE)(); \
if (source) { \
*scratch = x; \
Expand Down
1 change: 1 addition & 0 deletions libclc/libspirv/lib/generic/SOURCES
Original file line number Diff line number Diff line change
Expand Up @@ -205,5 +205,6 @@ shared/vload.cl
shared/vstore.cl
workitem/get_global_id.cl
workitem/get_global_size.cl
workitem/get_local_linear_id.cl
workitem/get_num_sub_groups.cl
workitem/get_sub_group_size.cl
16 changes: 16 additions & 0 deletions libclc/libspirv/lib/generic/workitem/get_local_linear_id.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include <libspirv/spirv.h>

_CLC_DEF _CLC_OVERLOAD size_t __spirv_LocalInvocationIndex() {
return __spirv_LocalInvocationId_z() * __spirv_WorkgroupSize_y() *
__spirv_WorkgroupSize_x() +
__spirv_LocalInvocationId_y() * __spirv_WorkgroupSize_x() +
__spirv_LocalInvocationId_x();
}
13 changes: 1 addition & 12 deletions libclc/libspirv/lib/ptx-nvidiacl/group/collectives.cl
Original file line number Diff line number Diff line change
Expand Up @@ -624,17 +624,6 @@ __CLC_GROUP_COLLECTIVE__DF16(_Z20__spirv_GroupFMulKHRjjDF16_,
#undef __CLC_ADD
#undef __CLC_MUL

long __clc__get_linear_local_id() {
size_t id_x = __spirv_LocalInvocationId_x();
size_t id_y = __spirv_LocalInvocationId_y();
size_t id_z = __spirv_LocalInvocationId_z();
size_t size_x = __spirv_WorkgroupSize_x();
size_t size_y = __spirv_WorkgroupSize_y();
size_t size_z = __spirv_WorkgroupSize_z();
uint sg_size = __spirv_SubgroupMaxSize();
return (id_z * size_y * size_x + id_y * size_x + id_x);
}

long __clc__2d_to_linear_local_id(ulong2 id) {
size_t size_x = __spirv_WorkgroupSize_x();
size_t size_y = __spirv_WorkgroupSize_y();
Expand All @@ -654,7 +643,7 @@ long __clc__3d_to_linear_local_id(ulong3 id) {
if (scope == Subgroup) { \
return __clc__SubgroupShuffle(x, local_id); \
} \
bool source = (__clc__get_linear_local_id() == local_id); \
bool source = (__spirv_LocalInvocationIndex() == local_id); \
__local TYPE *scratch = __CLC_APPEND(__clc__get_group_scratch_, TYPE)(); \
if (source) { \
*scratch = x; \
Expand Down
37 changes: 25 additions & 12 deletions libdevice/cmake/modules/SYCLLibdevice.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -248,46 +248,46 @@ if (NOT MSVC AND UR_SANITIZER_INCLUDE_DIR)
-I${UR_SANITIZER_INCLUDE_DIR}
-I${CMAKE_CURRENT_SOURCE_DIR})

set(asan_pvc_compile_opts_obj -fsycl -c
set(sanitizer_pvc_compile_opts_obj -fsycl -c
${sanitizer_generic_compile_opts}
${sycl_pvc_target_opt}
-D__LIBDEVICE_PVC__)

set(asan_cpu_compile_opts_obj -fsycl -c
set(sanitizer_cpu_compile_opts_obj -fsycl -c
${sanitizer_generic_compile_opts}
${sycl_cpu_target_opt}
-D__LIBDEVICE_CPU__)

set(asan_dg2_compile_opts_obj -fsycl -c
set(sanitizer_dg2_compile_opts_obj -fsycl -c
${sanitizer_generic_compile_opts}
${sycl_dg2_target_opt}
-D__LIBDEVICE_DG2__)

set(asan_pvc_compile_opts_bc ${bc_device_compile_opts}
set(sanitizer_pvc_compile_opts_bc ${bc_device_compile_opts}
${sanitizer_generic_compile_opts}
-D__LIBDEVICE_PVC__)

set(asan_cpu_compile_opts_bc ${bc_device_compile_opts}
set(sanitizer_cpu_compile_opts_bc ${bc_device_compile_opts}
${sanitizer_generic_compile_opts}
-D__LIBDEVICE_CPU__)

set(asan_dg2_compile_opts_bc ${bc_device_compile_opts}
set(sanitizer_dg2_compile_opts_bc ${bc_device_compile_opts}
${sanitizer_generic_compile_opts}
-D__LIBDEVICE_DG2__)

set(asan_pvc_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
set(sanitizer_pvc_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
-foffload-lto=thin
${sanitizer_generic_compile_opts}
${sycl_pvc_target_opt}
-D__LIBDEVICE_PVC__)

set(asan_cpu_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
set(sanitizer_cpu_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
-foffload-lto=thin
${sanitizer_generic_compile_opts}
${sycl_cpu_target_opt}
-D__LIBDEVICE_CPU__)

set(asan_dg2_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
set(sanitizer_dg2_compile_opts_obj-new-offload -fsycl -c --offload-new-driver
-foffload-lto=thin
${sanitizer_generic_compile_opts}
${sycl_dg2_target_opt}
Expand Down Expand Up @@ -373,16 +373,16 @@ else()
-I${CMAKE_CURRENT_SOURCE_DIR})

# asan aot
set(asan_filetypes obj obj-new-offload bc)
set(sanitizer_filetypes obj obj-new-offload bc)
set(asan_devicetypes pvc cpu dg2)

foreach(asan_ft IN LISTS asan_filetypes)
foreach(asan_ft IN LISTS sanitizer_filetypes)
foreach(asan_device IN LISTS asan_devicetypes)
compile_lib_ext(libsycl-asan-${asan_device}
SRC sanitizer/asan_rtl.cpp
FILETYPE ${asan_ft}
DEPENDENCIES ${asan_obj_deps}
OPTS ${asan_${asan_device}_compile_opts_${asan_ft}})
OPTS ${sanitizer_${asan_device}_compile_opts_${asan_ft}})
endforeach()
endforeach()

Expand All @@ -393,6 +393,19 @@ else()
EXTRA_OPTS -fno-sycl-instrument-device-code
-I${UR_SANITIZER_INCLUDE_DIR}
-I${CMAKE_CURRENT_SOURCE_DIR})

set(msan_devicetypes pvc cpu)

foreach(msan_ft IN LISTS sanitizer_filetypes)
foreach(msan_device IN LISTS msan_devicetypes)
compile_lib_ext(libsycl-msan-${msan_device}
SRC sanitizer/msan_rtl.cpp
FILETYPE ${msan_ft}
DEPENDENCIES ${msan_obj_deps}
OPTS ${sanitizer_${msan_device}_compile_opts_${msan_ft}})
endforeach()
endforeach()

endif()
endif()

Expand Down
6 changes: 6 additions & 0 deletions libdevice/sanitizer/msan_rtl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,11 @@ DEVICE_EXTERN_C_NOINLINE uptr __msan_get_shadow(uptr addr, uint32_t as) {
MSAN_DEBUG(__spirv_ocl_printf(__msan_print_launchinfo, (void *)launch_info,
launch_info->GlobalShadowOffset));

#if defined(__LIBDEVICE_PVC__)
shadow_ptr = __msan_get_shadow_pvc(addr, as);
#elif defined(__LIBDEVICE_CPU__)
shadow_ptr = __msan_get_shadow_cpu(addr);
#else
if (LIKELY(launch_info->DeviceTy == DeviceType::CPU)) {
shadow_ptr = __msan_get_shadow_cpu(addr);
} else if (launch_info->DeviceTy == DeviceType::GPU_PVC) {
Expand All @@ -209,6 +214,7 @@ DEVICE_EXTERN_C_NOINLINE uptr __msan_get_shadow(uptr addr, uint32_t as) {
MSAN_DEBUG(__spirv_ocl_printf(__msan_print_unsupport_device_type,
launch_info->DeviceTy));
}
#endif

MSAN_DEBUG(__spirv_ocl_printf(__msan_print_shadow, (void *)addr, as,
(void *)shadow_ptr, *(u8 *)shadow_ptr));
Expand Down
2 changes: 1 addition & 1 deletion sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
set(UNIFIED_RUNTIME_REPO "https://github.com/Bensuo/unified-runtime.git")
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules/UnifiedRuntimeTag.cmake)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
Expand Down
12 changes: 6 additions & 6 deletions sycl/cmake/modules/UnifiedRuntimeTag.cmake
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# commit 08d36b76a5b1c4f080e3301507a39525ab5ab365
# Merge: 4c504dbc e6b61c67
# commit f07688dbc20c73d7e480cb62d7dc0ce7dc822bd3
# Merge: 7d864b6c 3dbf8b24
# Author: Kenneth Benzie (Benie) <[email protected]>
# Date: Tue Feb 4 13:14:19 2025 +0000
# Merge pull request #2614 from kurapov-peter/spills
# Add UR_KERNEL_INFO_SPILL_MEM_SIZE kernel info prop
set(UNIFIED_RUNTIME_TAG 08d36b76a5b1c4f080e3301507a39525ab5ab365)
# Date: Tue Feb 4 15:45:49 2025 +0000
# Merge pull request #2618 from winstonzhang-intel/max_eu_count_calculation
# [L0] MAX_COMPUTE_UNITS using ze_eu_count_ext_t
set(UNIFIED_RUNTIME_TAG "ewan/native_command")
Original file line number Diff line number Diff line change
Expand Up @@ -173,12 +173,82 @@ dependencies are satisfied.
The SYCL command described above completes once all of the native asynchronous
tasks it contains have completed.

The call to `interopCallable` must not submit any synchronous tasks to the
The call to `interopCallable` should not submit any synchronous tasks to the
native backend object, and it must not block waiting for any tasks to complete.
The call also must not add tasks to backend objects that underly any other
queue, aside from the queue that is associated with this handler. If it does
any of these things, the behavior is undefined.

=== sycl_ext_oneapi_graph Interaction

`ext_codeplay_enqueue_native_command` can be used in the
link:../experimental/sycl_ext_oneapi_graph.asciidoc[sycl_ext_oneapi_graph]
extension as a graph node. The `interopCallable` object will be invoked
during `command_graph::finalize()` when the backend object for the graph
is available to give to the user as a handle. The user then may
add nodes to this backend graph objects using native APIs. Note that this
involves a synchronous API call to a native backend object, which is an
exception to earlier advice about submitting synchronous task to native
backend objects inside `interopCallable`.

The runtime will schedule the dependencies of the user added nodes such
that they respect the graph node edges.

=== Additions to the interop_handler class

TODO: Document backend return types, Move this info to main graphs spec,
defin interaction with host-task.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defin interaction with host-task.
define interaction with host-task.


* CUGraph
* hipGraph_t
* ze_command_list_handle_t
* cl_command_buffer_khr



```c++
using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;

class interop_handle {
bool ext_oneapi_has_graph() const;

template <backend Backend>
backend_return_t<Backend, graph> ext_oneapi_get_native_graph() const;

};
```

Table {counter: tableNumber}. Additional member functions of the `sycl::interop_handle` class.
[cols="2a,a"]
|===
|Member function|Description

|
[source,c++]
----
bool interop_handle::ext_oneapi_has_graph() const;
----

| Query if the `interop_handle` object has a native graph object available.

|
[source,c++]
----
template <backend Backend>
backend_return_t<Backend, graph> interop_handle::ext_oneapi_get_native_graph() const;
----

| Return the native graph object associated with the `interop_handle`.

Exceptions:

* Throws with error code `invalid` if there is no native graph object
associated with the interop handle.

|===


== Example

This example demonstrates how to use this extension to enqueue asynchronous
Expand Down Expand Up @@ -206,12 +276,3 @@ q.submit([&](sycl::handler &cgh) {
});
q.wait();
```

== Issues

=== sycl_ext_oneapi_graph

`ext_codeplay_enqueue_native_command`
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Loading
Loading