Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable DMABUF support for ROCm #309

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

paklui
Copy link

@paklui paklui commented Jan 16, 2025

Enable DMABUF support in perftest for AMD GPU with ROCm

To build with ROCm DMA-BUF support, use:
./configure --enable-rocm --with-rocm=/opt/rocm --enable-rocm-dmabuf

To run with ROCm DMA-BUF enabled, use:
ib_write_bw --use_rocm_dmabuf --use_rocm=<rocm device id>

An example to run ib_write_bw across 2 nodes using DMABUF:

# run server and client using -use_rocm_dmabuf --use_rocm=0:
$ ./ib_write_bw --use_rocm=0 --use_rocm_dmabuf -x 3 -a -F -q 2 --report_gbits -d bnxt_re0 -i 1
$ ./ib_write_bw --use_rocm=0 --use_rocm_dmabuf -x 3 -a -F -q 2 --report_gbits -d bnxt_re0 -i 1 <server>

************************************
* Waiting for client to connect... *
************************************
Using ROCm Device with ID: 0, Name: AMD Instinct MI300X, PCI Bus ID: 0x9, GCN Arch: gfx942:sramecc+:xnack-
using DMA-BUF for GPU buffer address at 0x7f0b71000000 aligned at 0x7f0b71000000 with aligned size 33554432
dmabuf export addr 0x7f0b71000000 33554432 to dmabuf fd 8 offset 0
allocated 33554432 bytes of GPU buffer at 0x7f0b71000000
Calling ibv_reg_dmabuf_mr(offset=0, size=33554432, addr=0x7f0b71000000, fd=8) for QP #0
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : bnxt_re0
 Number of qps   : 2            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : OFF          Using DDP      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Use ROCm memory : ON
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x2c24 PSN 0x176c1e RKey 0x200ee18 VAddr 0x007f0b72000000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:11
 local address: LID 0000 QPN 0x2c27 PSN 0x8b66b0 RKey 0x200ee18 VAddr 0x007f0b72800000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:11
 remote address: LID 0000 QPN 0x2c6c PSN 0x7f0c60 RKey 0x200e718 VAddr 0x007f1ffae00000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:12
 remote address: LID 0000 QPN 0x2c6a PSN 0x7deeca RKey 0x200e718 VAddr 0x007f1ffb600000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:12
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 8388608    10000            392.33             392.33               0.005846
---------------------------------------------------------------------------------------
deallocating GPU buffer 0x7f0b71000000
(venv-param) [paklui@gt-pla-u18-08 perftest-master-dmabuf-rocm]$

An example for running with (previous/existing) peermem support, use --use_rocm=<rocm device id> without --use_rocm_dmabuf flag:

# for the previous/existing peermem support, run server and client using  --use_rocm=0:
$ ./ib_write_bw --use_rocm=0 -x 3 -a -F -q 2 --report_gbits -d bnxt_re0 -i 1
$ ./ib_write_bw --use_rocm=0 -x 3 -a -F -q 2 --report_gbits -d bnxt_re0 -i 1 <server>
************************************
* Waiting for client to connect... *
************************************
Using ROCm Device with ID: 0, Name: AMD Instinct MI300X, PCI Bus ID: 0x9, GCN Arch: gfx942:sramecc+:xnack-
allocated 33554432 bytes of GPU buffer at 0x7ecda9000000
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : bnxt_re0
 Number of qps   : 2            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : OFF          Using DDP      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Use ROCm memory : ON
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x2c24 PSN 0x42d66 RKey 0x2011819 VAddr 0x007ecdaa000000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:11
 local address: LID 0000 QPN 0x2c27 PSN 0x606258 RKey 0x2011819 VAddr 0x007ecdaa800000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:11
 remote address: LID 0000 QPN 0x2c6c PSN 0x1ba0b6 RKey 0x200771b VAddr 0x007ec722e00000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:12
 remote address: LID 0000 QPN 0x2c6a PSN 0xb65b68 RKey 0x200771b VAddr 0x007ec723600000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:07:12
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 8388608    10000            392.34             392.34               0.005846
---------------------------------------------------------------------------------------
deallocating GPU buffer 0x7ecda9000000
(venv-param) [paklui@gt-pla-u18-08 perftest-master-dmabuf-rocm]$

Enable DMABUF to ROCm using HIP/HSA as interface
@paklui paklui force-pushed the master-dmabuf-rocm branch from 7b335bf to f5c3035 Compare January 20, 2025 22:31
@paklui
Copy link
Author

paklui commented Jan 20, 2025

Hi @mrgolin, @HassanKhadour, @sshaulnv would you be able to review this PR? Thanks in advance

Copy link

@ahubbe-amd ahubbe-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Allen Hubbe [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants