Comparison of GPUDirect ( GPU Memory)- NIC access vs Hop Memory GPU-> Host Memory -> NIC #297

alokprasad · 2024-11-13T06:17:19Z

Currently perftest supports GPU Direct support where NIC can directly access GPU memory , but it would be good to have comparison it without GPU Direct i.e. GPU Memory -> Copied to Host Memory -> NIC . Can someone give pointer how to make this change.
what i think we need to allocate host memory and copy gpu memory using cuMemcpyDtoH , then this host memory need to be used for MR?

sshaulnv · 2024-11-14T13:39:56Z

Hi @alokprasad,
do you mean to perform the copies in the datapath?

alokprasad · 2024-11-14T14:00:03Z

@sshaulnv yes..that would give insight on the improvement achieved by gpudurect

sshaulnv · 2024-11-14T14:10:10Z

To ensure optimal bandwidth, we generally avoid performing intensive operations within the datapath.
Assuming GPUDirect is unavailable and we need to send a message from GPU memory, we would probably first copy the buffer to host memory before entering the datapath.

alokprasad · 2024-11-15T05:09:38Z

@sshaulnv i agree thats good solution if we have constant data..but consider a scenario that Host 1 sends GPU data to Host 2 and it recives back Host2 does some processing , we need to do constant copying host mem-gpu mem in data path.

drossetti · 2025-01-01T16:17:27Z

@alokprasad IMO implementing a staging data path in perftest does not make much sense. It is much easier and meaningful to leverage UCX or a GPU-aware MPI library + OSU MPI benchmark.
Btw a number of papers have done that already.

alokprasad · 2025-01-03T04:39:01Z

@drossetti I got the point, can you please point me to the papers , hopefully with some github links to checkout the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of GPUDirect ( GPU Memory)- NIC access vs Hop Memory GPU-> Host Memory -> NIC #297

Comparison of GPUDirect ( GPU Memory)- NIC access vs Hop Memory GPU-> Host Memory -> NIC #297

alokprasad commented Nov 13, 2024

sshaulnv commented Nov 14, 2024

alokprasad commented Nov 14, 2024

sshaulnv commented Nov 14, 2024

alokprasad commented Nov 15, 2024

drossetti commented Jan 1, 2025

alokprasad commented Jan 3, 2025

Comparison of GPUDirect ( GPU Memory)- NIC access vs Hop Memory GPU-> Host Memory -> NIC #297

Comparison of GPUDirect ( GPU Memory)- NIC access vs Hop Memory GPU-> Host Memory -> NIC #297

Comments

alokprasad commented Nov 13, 2024

sshaulnv commented Nov 14, 2024

alokprasad commented Nov 14, 2024

sshaulnv commented Nov 14, 2024

alokprasad commented Nov 15, 2024

drossetti commented Jan 1, 2025

alokprasad commented Jan 3, 2025