cudaMemcpy consuming CPU resources ? #58

lix19937 · 2024-11-08T01:04:01Z

lix19937 · 2024-11-09T15:43:27Z

With cudaMemcpy the CUDA driver detects that you are copying from a host pointer to a host pointer and the copy is done on the CPU. You can of course use memcpy on the CPU yourself if you prefer.

If you use cudaMemcpy, there may be an extra stream synchronize performed before doing the copy (which you may see in the profiler, but I'm guessing there—test and see).

On a UVA system you can just use cudaMemcpyDefault as talonmies says in his answer. But if you don’t have UVA (sm_20+ and 64-bit OS), then you have to call the right copy (e.g. cudaMemcpyDeviceToDevice). If you cudaHostRegister() everything you are interested in then cudaMemcpyDeviceToDevice will end up doing the following depending on the where the memory is located:

Host   <-> Host  : performed by the CPU (memcpy)
Host   <-> Device: DMA (device copy engine)
Device <-> Device: Memcpy CUDA kernel (runs on the SMs, launched by driver)

https://stackoverflow.com/questions/12453677/better-or-the-same-cpu-memcpy-vs-device-cudamemcpy-on-pinned-mapped-memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaMemcpy consuming CPU resources ? #58

cudaMemcpy consuming CPU resources ? #58

lix19937 commented Nov 8, 2024

lix19937 commented Nov 9, 2024 •

edited

Loading

cudaMemcpy consuming CPU resources ? #58

cudaMemcpy consuming CPU resources ? #58

Comments

lix19937 commented Nov 8, 2024

lix19937 commented Nov 9, 2024 • edited Loading

lix19937 commented Nov 9, 2024 •

edited

Loading