-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36103: [C++] Initial device sync API #37040
Conversation
CC @kkraus14 |
|
8cd91ac
to
8a6148b
Compare
} | ||
|
||
static ArrayFactory JSONArrayFactory(const std::shared_ptr<MemoryManager>& mm, | ||
std::shared_ptr<DataType> type, const char* json) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a const char*
here can we take a string_view
? Not sure how we know what length the json is currently (unless it's assumed to be null terminated?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it's assumed to be null terminated for these tests
return nullptr; | ||
// auto ev = reinterpret_cast<CUstream*>(sync_event); | ||
// return std::make_shared<CudaDeviceSync>(ev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, in the next PR I'll implement the Cuda implementations ie: a CudaDeviceSync
object etc.
cpp/src/arrow/device.h
Outdated
class ARROW_EXPORT DeviceSync { | ||
public: | ||
explicit DeviceSync(void* sync_event) : sync_event_{sync_event}, owns_event_{false} {} | ||
virtual ~DeviceSync() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler-generated default constructor won't call release_event; I think this destructor should probably do that
virtual ~DeviceSync() = default; | |
virtual ~DeviceSync() { | |
if (owns_event_) { | |
release_event(sync_event_); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that release_event
is a virtual function which is a bad idea to call from a destructor, so the current semantics require explicitly calling release_event
.
cpp/src/arrow/device.h
Outdated
return sync_event_; | ||
} | ||
|
||
void clear_event() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be used (instead of just destroying the DeviceSync)? I think it'd be best to keep the interface minimal until we can document the usage which each function anticipates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this has to be used instead of just destroying the DeviceSync for the same reason as above, it's not safe to call a polymorphic virtual function from a destructor.
cpp/src/arrow/device.h
Outdated
/// Tells the provided stream that it should wait until the | ||
/// synchronization event is completed without blocking the CPU. | ||
/// @param stream Should be appropriate for the underlying device | ||
virtual Status stream_wait(void* stream) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly: it's not clear when this would be used instead of set_stream() then wait()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait
specifies that it blocks until the internal event is completed. stream_wait
places a wait into the stream queue and will not block the CPU.
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
I posted a couple comments but we can probably go with this regardless. We'll probably end up changing those APIs a bit once comes the time to use them concretely. |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit d062c89. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change Building on the `ArrowDeviceArray` we need to expand the abstractions for handling events and stream synchronization for devices. ### What changes are included in this PR? Initial Abstract implementations for the new DeviceSync API and a CPU implementation. This will be followed up by a CUDA implementation in a subsequent PR. ### Are these changes tested? Yes, tests are added for Import/Export DeviceArrays using the DeviceSync handling. * Closes: apache#36103 Lead-authored-by: Matt Topol <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Rationale for this change
Building on the
ArrowDeviceArray
we need to expand the abstractions for handling events and stream synchronization for devices.What changes are included in this PR?
Initial Abstract implementations for the new DeviceSync API and a CPU implementation. This will be followed up by a CUDA implementation in a subsequent PR.
Are these changes tested?
Yes, tests are added for Import/Export DeviceArrays using the DeviceSync handling.