[WIP] Fix default allocator #1127

pkufool · 2022-12-07T03:36:09Z

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch. There are still some issues to be fixed.

NvToolExt.h Not found error (if K2_ENABLE_NVTX=ON). I found that the cuda env was handle by Pytorch before, I am reading FindCUDA.cmake in Pytorch and try to extract some script there.
About the cudaStream, I am still a little confued about how k2 or pytorch handle cuda streams, now I always allocate memory on the default stream (i.e. stream 0).
The caching strategy (using default now, might need some tuning and benchmark).
A global Caching allocator or each Context has its own allocator instance? Which one is better?

Even though this is a small fixes, allocator is the fundamental part, I may miss someting important, hope for your suggestions.

pkufool · 2022-12-07T07:16:20Z

k2/csrc/default_context.cu

  }

  cudaStream_t GetCudaStream() const override {
-    return g_stream_override.OverrideStream(stream_);
+#ifdef K2_WITH_CUDA
+    return g_stream_override.OverrideStream(0);


Here, in Pytorch_Context.cu, we get the stream by c10::cuda::getCurrentCudaStream, I read the source code of Pytorch, if we don't setCurrentCudaStream the return stream is always the default stream (i.e. stream 0).

So, can we use the default stream 0 here? (0 might not be good here, we can add a macro some like kDefaultStream)

pkufool · 2022-12-07T07:19:26Z

k2/csrc/default_context.cu

+#ifdef K2_WITH_CUDA
+    DeviceGuard guard(gpu_id_);
+    // the default stream is 0
+    auto ret = allocator_->DeviceAllocate(&p, bytes);


Here, the allocator allocates memory with default stream 0, we can pass a stream to it as well.

csukuangfj · 2022-12-07T07:36:24Z

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch

What is the use case? That is, how will we use k2 with cuda but without PyTorch?

pkufool · 2022-12-07T07:43:50Z

The purpose of this PR is to fix the default_context.cu so that we can use k2 without Pytorch

What is the use case? That is, how will we use k2 with cuda but without PyTorch?

For decoding purpose, if we use onnxruntime or tensorRT as the inference engine, I don't think we should depend on Pytorch at that time. The inference engine might have allocator, though, I am not sure.

csukuangfj · 2022-12-07T07:48:56Z

The inference engine might have allocator, though, I am not sure.

In that case, I would suggest creating another context that wraps the allocator from the given framework.

pkufool · 2022-12-07T08:06:55Z

The inference engine might have allocator, though, I am not sure.

In that case, I would suggest creating another context that wraps the allocator from the given framework.

Yes, we will see, just to fix the default_context.cu, it is not working now.

pkufool added 6 commits December 3, 2022 13:42

Fix default context (cpu)

61126a8

Recover default context

c76fa64

Merge branch 'master' into default_allocator

c60b283

Fix zero bytes allocation

a53baa0

Minor fixes

63be046

Fix style

a77e41b

pkufool commented Dec 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix default allocator #1127

[WIP] Fix default allocator #1127

pkufool commented Dec 7, 2022

pkufool Dec 7, 2022

pkufool Dec 7, 2022

csukuangfj commented Dec 7, 2022

pkufool commented Dec 7, 2022

csukuangfj commented Dec 7, 2022

pkufool commented Dec 7, 2022 •

edited

Loading

[WIP] Fix default allocator #1127

Are you sure you want to change the base?

[WIP] Fix default allocator #1127

Conversation

pkufool commented Dec 7, 2022

pkufool Dec 7, 2022

Choose a reason for hiding this comment

pkufool Dec 7, 2022

Choose a reason for hiding this comment

csukuangfj commented Dec 7, 2022

pkufool commented Dec 7, 2022

csukuangfj commented Dec 7, 2022

pkufool commented Dec 7, 2022 • edited Loading

pkufool commented Dec 7, 2022 •

edited

Loading