Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT - [SYCL][Graph] Support for native-command #16871

Draft
wants to merge 1 commit into
base: sycl
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
set(UNIFIED_RUNTIME_REPO "https://github.com/Bensuo/unified-runtime.git")
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules/UnifiedRuntimeTag.cmake)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
Expand Down
2 changes: 1 addition & 1 deletion sycl/cmake/modules/UnifiedRuntimeTag.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# Date: Tue Feb 4 15:45:49 2025 +0000
# Merge pull request #2618 from winstonzhang-intel/max_eu_count_calculation
# [L0] MAX_COMPUTE_UNITS using ze_eu_count_ext_t
set(UNIFIED_RUNTIME_TAG f07688dbc20c73d7e480cb62d7dc0ce7dc822bd3)
set(UNIFIED_RUNTIME_TAG "ewan/native_command")
Original file line number Diff line number Diff line change
Expand Up @@ -173,12 +173,82 @@ dependencies are satisfied.
The SYCL command described above completes once all of the native asynchronous
tasks it contains have completed.

The call to `interopCallable` must not submit any synchronous tasks to the
The call to `interopCallable` should not submit any synchronous tasks to the
native backend object, and it must not block waiting for any tasks to complete.
The call also must not add tasks to backend objects that underly any other
queue, aside from the queue that is associated with this handler. If it does
any of these things, the behavior is undefined.

=== sycl_ext_oneapi_graph Interaction

`ext_codeplay_enqueue_native_command` can be used in the
link:../experimental/sycl_ext_oneapi_graph.asciidoc[sycl_ext_oneapi_graph]
extension as a graph node. The `interopCallable` object will be invoked
during `command_graph::finalize()` when the backend object for the graph
is available to give to the user as a handle. The user then may
add nodes to this backend graph objects using native APIs. Note that this
involves a synchronous API call to a native backend object, which is an
exception to earlier advice about submitting synchronous task to native
backend objects inside `interopCallable`.

The runtime will schedule the dependencies of the user added nodes such
that they respect the graph node edges.

=== Additions to the interop_handler class

TODO: Document backend return types, Move this info to main graphs spec,
defin interaction with host-task.

* CUGraph
* hipGraph_t
* ze_command_list_handle_t
* cl_command_buffer_khr



```c++
using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;

class interop_handle {
bool ext_oneapi_has_graph() const;

template <backend Backend>
backend_return_t<Backend, graph> ext_oneapi_get_native_graph() const;

};
```

Table {counter: tableNumber}. Additional member functions of the `sycl::interop_handle` class.
[cols="2a,a"]
|===
|Member function|Description

|
[source,c++]
----
bool interop_handle::ext_oneapi_has_graph() const;
----

| Query if the `interop_handle` object has a native graph object available.

|
[source,c++]
----
template <backend Backend>
backend_return_t<Backend, graph> interop_handle::ext_oneapi_get_native_graph() const;
----

| Return the native graph object associated with the `interop_handle`.

Exceptions:

* Throws with error code `invalid` if there is no native graph object
associated with the interop handle.

|===


== Example

This example demonstrates how to use this extension to enqueue asynchronous
Expand Down Expand Up @@ -206,12 +276,3 @@ q.submit([&](sycl::handler &cgh) {
});
q.wait();
```

== Issues

=== sycl_ext_oneapi_graph

`ext_codeplay_enqueue_native_command`
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Original file line number Diff line number Diff line change
Expand Up @@ -2077,13 +2077,9 @@ extensions.

==== sycl_ext_codeplay_enqueue_native_command

`ext_codeplay_enqueue_native_command`, defined in
`ext_codeplay_enqueue_native_command` commands, defined in
link:../experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc[sycl_ext_codeplay_enqueue_native_command]
cannot be used in graph nodes. A synchronous exception will be thrown with error
code `invalid` if a user tries to add them to a graph.

Removing this restriction is something we may look at for future revisions of
`sycl_ext_oneapi_graph`.
can be used in graph nodes. See the section on `sycl_ext_oneapi_graph`.

==== sycl_ext_intel_queue_index

Expand Down
12 changes: 12 additions & 0 deletions sycl/include/sycl/detail/backend_traits_cuda.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,15 @@
#include <sycl/detail/backend_traits.hpp>
#include <sycl/device.hpp>
#include <sycl/event.hpp>
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/queue.hpp>

typedef int CUdevice;
typedef struct CUctx_st *CUcontext;
typedef struct CUstream_st *CUstream;
typedef struct CUevent_st *CUevent;
typedef struct CUmod_st *CUmodule;
typedef struct CUgraph_st *CUgraph;

// As defined in the CUDA 10.1 header file. This requires CUDA version > 3.2
#if defined(_WIN64) || defined(__LP64__)
Expand Down Expand Up @@ -102,6 +104,16 @@ template <> struct BackendReturn<backend::ext_oneapi_cuda, queue> {
using type = CUstream;
};

using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;
template <> struct BackendInput<backend::ext_oneapi_cuda, graph> {
using type = CUgraph;
};

template <> struct BackendReturn<backend::ext_oneapi_cuda, graph> {
using type = CUgraph;
};

} // namespace detail
} // namespace _V1
} // namespace sycl
13 changes: 13 additions & 0 deletions sycl/include/sycl/detail/backend_traits_hip.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <sycl/detail/backend_traits.hpp>
#include <sycl/device.hpp>
#include <sycl/event.hpp>
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/queue.hpp>

typedef int HIPdevice;
Expand All @@ -25,6 +26,8 @@ typedef struct ihipStream_t *HIPstream;
typedef struct ihipEvent_t *HIPevent;
typedef struct ihipModule_t *HIPmodule;
typedef void *HIPdeviceptr;
typedef struct ihipGraph *HIPGraph;
typedef struct hipGraphNode *HIPGraphNode;

namespace sycl {
inline namespace _V1 {
Expand Down Expand Up @@ -96,6 +99,16 @@ template <> struct BackendReturn<backend::ext_oneapi_hip, queue> {
using type = HIPstream;
};

using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;
template <> struct BackendInput<backend::ext_oneapi_hip, graph> {
using type = HIPGraph;
};

template <> struct BackendReturn<backend::ext_oneapi_hip, graph> {
using type = HIPGraph;
};

template <> struct InteropFeatureSupportMap<backend::ext_oneapi_hip> {
static constexpr bool MakePlatform = false;
static constexpr bool MakeDevice = true;
Expand Down
25 changes: 18 additions & 7 deletions sycl/include/sycl/detail/backend_traits_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,14 @@
#include <sycl/device.hpp> // for device
#include <sycl/event.hpp> // for event
#include <sycl/ext/oneapi/backend/level_zero_ownership.hpp> // for ownership
#include <sycl/image.hpp> // for image
#include <sycl/kernel.hpp> // for kernel
#include <sycl/kernel_bundle.hpp> // for kernel_b...
#include <sycl/kernel_bundle_enums.hpp> // for bundle_s...
#include <sycl/platform.hpp> // for platform
#include <sycl/property_list.hpp> // for property...
#include <sycl/range.hpp> // for range
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/image.hpp> // for image
#include <sycl/kernel.hpp> // for kernel
#include <sycl/kernel_bundle.hpp> // for kernel_b...
#include <sycl/kernel_bundle_enums.hpp> // for bundle_s...
#include <sycl/platform.hpp> // for platform
#include <sycl/property_list.hpp> // for property...
#include <sycl/range.hpp> // for range

#include <variant> // for variant
#include <vector> // for vector
Expand Down Expand Up @@ -207,6 +208,16 @@ template <> struct BackendReturn<backend::ext_oneapi_level_zero, kernel> {
using type = ze_kernel_handle_t;
};

using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;
template <> struct BackendInput<backend::ext_oneapi_level_zero, graph> {
using type = ze_command_list_handle_t;
};

template <> struct BackendReturn<backend::ext_oneapi_level_zero, graph> {
using type = ze_command_list_handle_t;
};

template <> struct InteropFeatureSupportMap<backend::ext_oneapi_level_zero> {
static constexpr bool MakePlatform = true;
static constexpr bool MakeDevice = true;
Expand Down
17 changes: 14 additions & 3 deletions sycl/include/sycl/detail/backend_traits_opencl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@
#include <sycl/detail/ur.hpp> // for assertion and ur handles
#include <sycl/device.hpp> // for device
#include <sycl/event.hpp> // for event
#include <sycl/kernel.hpp> // for kernel
#include <sycl/kernel_bundle_enums.hpp> // for bundle_state
#include <sycl/platform.hpp> // for platform
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/kernel.hpp> // for kernel
#include <sycl/kernel_bundle_enums.hpp> // for bundle_state
#include <sycl/platform.hpp> // for platform

#include <vector> // for vector

Expand Down Expand Up @@ -132,6 +133,16 @@ template <> struct BackendReturn<backend::opencl, kernel> {
using type = cl_kernel;
};

using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;
template <> struct BackendInput<backend::opencl, graph> {
using type = cl_command_buffer_khr;
};

template <> struct BackendReturn<backend::opencl, graph> {
using type = cl_command_buffer_khr;
};

template <> struct InteropFeatureSupportMap<backend::opencl> {
static constexpr bool MakePlatform = true;
static constexpr bool MakeDevice = true;
Expand Down
3 changes: 2 additions & 1 deletion sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ enum class node_type {
prefetch = 6,
memadvise = 7,
ext_oneapi_barrier = 8,
host_task = 9
host_task = 9,
native_command = 10
};

/// Class representing a node in the graph, returned by command_graph::add().
Expand Down
3 changes: 0 additions & 3 deletions sycl/include/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1934,9 +1934,6 @@ class __SYCL_EXPORT handler {
void(interop_handle)>::value>
ext_codeplay_enqueue_native_command([[maybe_unused]] FuncT &&Func) {
#ifndef __SYCL_DEVICE_ONLY__
throwIfGraphAssociated<
ext::oneapi::experimental::detail::UnsupportedGraphFeatures::
sycl_ext_codeplay_enqueue_native_command>();
ext_codeplay_enqueue_native_command_impl(Func);
#endif
}
Expand Down
31 changes: 29 additions & 2 deletions sycl/include/sycl/interop_handle.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include <sycl/detail/impl_utils.hpp> // for getSyclObjImpl
#include <sycl/exception.hpp>
#include <sycl/ext/oneapi/accessor_property_list.hpp> // for accessor_property_list
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/image.hpp> // for image
#include <ur_api.h> // for ur_mem_handle_t, ur...

Expand Down Expand Up @@ -49,6 +50,9 @@ class interop_handle {
/// interop_handle.
__SYCL_EXPORT backend get_backend() const noexcept;

/// Returns true if command-group is being added to a graph as a node
__SYCL_EXPORT bool ext_oneapi_has_graph() const noexcept;

/// Receives a SYCL accessor that has been defined as a requirement for the
/// command group, and returns the underlying OpenCL memory object that is
/// used by the SYCL runtime. If the accessor passed as parameter is not part
Expand Down Expand Up @@ -134,6 +138,26 @@ class interop_handle {
#endif
}

using graph = ext::oneapi::experimental::command_graph<
ext::oneapi::experimental::graph_state::executable>;
template <backend Backend = backend::opencl>
backend_return_t<Backend, graph> ext_oneapi_get_native_graph() const {
#ifndef __SYCL_DEVICE_ONLY__
// TODO: replace the exception thrown below with the SYCL 2020 exception
// with the error code 'errc::backend_mismatch' when those new exceptions
// are ready to be used.
if (Backend != get_backend())
throw exception(make_error_code(errc::invalid),
"Incorrect backend argument was passed");

// C-style cast required to allow various native types
return (backend_return_t<Backend, graph>)getNativeGraph();
#else
// we believe this won't be ever called on device side
return 0;
#endif
}

/// Returns the SYCL application interoperability native backend object
/// associated with the device associated with the SYCL queue that the host
/// task was submitted to. The native backend object returned must be in
Expand Down Expand Up @@ -186,8 +210,9 @@ class interop_handle {
interop_handle(std::vector<ReqToMem> MemObjs,
const std::shared_ptr<detail::queue_impl> &Queue,
const std::shared_ptr<detail::device_impl> &Device,
const std::shared_ptr<detail::context_impl> &Context)
: MQueue(Queue), MDevice(Device), MContext(Context),
const std::shared_ptr<detail::context_impl> &Context,
const ur_exp_command_buffer_handle_t &Graph)
: MQueue(Queue), MDevice(Device), MContext(Context), MGraph(Graph),
MMemObjs(std::move(MemObjs)) {}

template <backend Backend, typename DataT, int Dims>
Expand All @@ -211,10 +236,12 @@ class interop_handle {
getNativeQueue(int32_t &NativeHandleDesc) const;
__SYCL_EXPORT ur_native_handle_t getNativeDevice() const;
__SYCL_EXPORT ur_native_handle_t getNativeContext() const;
__SYCL_EXPORT ur_native_handle_t getNativeGraph() const;

std::shared_ptr<detail::queue_impl> MQueue;
std::shared_ptr<detail::device_impl> MDevice;
std::shared_ptr<detail::context_impl> MContext;
ur_exp_command_buffer_handle_t MGraph;

std::vector<ReqToMem> MMemObjs;
};
Expand Down
2 changes: 2 additions & 0 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,8 @@ ur_exp_command_buffer_sync_point_t exec_graph_impl::enqueueNode(
std::shared_ptr<node_impl> Node) {

// Queue which will be used for allocation operations for accessors.
// Will also be used in native commands to return to the user in
// `interop_handler::get_native_queue()` calls
auto AllocaQueue = std::make_shared<sycl::detail::queue_impl>(
DeviceImpl, sycl::detail::getSyclObjImpl(Ctx), sycl::async_handler{},
sycl::property_list{});
Expand Down
2 changes: 2 additions & 0 deletions sycl/source/detail/graph_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ inline node_type getNodeTypeFromCG(sycl::detail::CGType CGType) {
return node_type::host_task;
case sycl::detail::CGType::ExecCommandBuffer:
return node_type::subgraph;
case sycl::detail::CGType::EnqueueNativeCommand:
return node_type::native_command;
default:
assert(false && "Invalid Graph Node Type");
return node_type::empty;
Expand Down
Loading
Loading