-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix default allocator #1127
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
cudaStream_t GetCudaStream() const override { | ||
return g_stream_override.OverrideStream(stream_); | ||
#ifdef K2_WITH_CUDA | ||
return g_stream_override.OverrideStream(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, in Pytorch_Context.cu
, we get the stream by c10::cuda::getCurrentCudaStream
, I read the source code of Pytorch, if we don't setCurrentCudaStream
the return stream is always the default stream (i.e. stream 0).
So, can we use the default stream 0 here? (0 might not be good here, we can add a macro some like kDefaultStream
)
#ifdef K2_WITH_CUDA | ||
DeviceGuard guard(gpu_id_); | ||
// the default stream is 0 | ||
auto ret = allocator_->DeviceAllocate(&p, bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the allocator allocates memory with default stream 0, we can pass a stream to it as well.
What is the use case? That is, how will we use k2 with cuda but without PyTorch? |
For decoding purpose, if we use onnxruntime or tensorRT as the inference engine, I don't think we should depend on Pytorch at that time. The inference engine might have allocator, though, I am not sure. |
In that case, I would suggest creating another context that wraps the allocator from the given framework. |
Yes, we will see, just to fix the |
The purpose of this PR is to fix the
default_context.cu
so that we can use k2 without Pytorch. There are still some issues to be fixed.NvToolExt.h
Not found error (if K2_ENABLE_NVTX=ON). I found that the cuda env was handle by Pytorch before, I am readingFindCUDA.cmake
in Pytorch and try to extract some script there.Context
has its own allocator instance? Which one is better?Even though this is a small fixes, allocator is the fundamental part, I may miss someting important, hope for your suggestions.