hmem: Define ofi_hmem_put_dmabuf_fd #10716

iziemba · 2025-01-22T04:32:56Z

For some HMEM ifaces, ofi_hmem_get_dmabuf_fd() may result in a new FD being allocated. Define ofi_hmem_put_dmabuf_fd() to close FD.

Support ofi_hmem_get_dmabuf_fd() with ROCR.

Update CXI provider accordingly.

For some HMEM ifaces, ofi_hmem_get_dmabuf_fd() may result in a new FD being allocated. Define ofi_hmem_put_dmabuf_fd() to close FD. Signed-off-by: Ian Ziemba <[email protected]>

With ROCR, callers of ofi_hmem_get_dmabuf_fd() should call ofi_hmem_put_dmabuf_fd() once the DMA buf region is no longer used. Signed-off-by: Ian Ziemba <[email protected]>

shijin-aws · 2025-01-22T05:05:14Z

would u mind implementing this for cuda as well? it should be simply a close(fd) and we may want to use it in #10708

shijin-aws · 2025-01-22T05:18:18Z

include/ofi_hmem.h

@@ -131,6 +131,7 @@ struct ofi_hmem_ops {
 				      const void *src, size_t size);
 	int (*get_dmabuf_fd)(const void *addr, uint64_t size, int *fd,
 			     uint64_t *offset);
+	int (*put_dmabuf_fd)(int fd);


Why not just name it as close_dmabuf_fd ?

I chose put_dmabuf_fd to semantically align with get_dmabuf_fd.

is there any hmem type where a get_dmabuf_fd does not implicitly increase the open count on the file? if not, I'd recommend going the other way and renaming get to open so that it's clear that it's not simply resolving an already-open file descriptor, but is actually opening a resource which must be eventually closed.

yes, the ZE hmem type always return the same fd.

With CUDA, callers of ofi_hmem_get_dmabuf_fd() should call ofi_hmem_put_dmabuf_fd() once the DMA buf region is no longer used. Signed-off-by: Ian Ziemba <[email protected]>

Signed-off-by: Ian Ziemba <[email protected]>

Performing multiple HSA allocations appears to result in a DMA buf offset. Verify that the CXI provider can register a DMA buf offset memory region. Signed-off-by: Ian Ziemba <[email protected]>

When a MR is freed, the CXI provider should free the DMA buf FD used for the ROCR region. Failing to do this will result in FDs being exhausted. Signed-off-by: Ian Ziemba <[email protected]>

When a MR is freed, the CXI provider should free the DMA buf FD used for the CUDA region. Failing to do this will result in FDs being exhausted. Signed-off-by: Ian Ziemba <[email protected]>

iziemba · 2025-01-22T15:25:00Z

@shijin-aws : I added CUDA support. If you could review, that would be great.

@swelch : If you could review CXI prov changes, that would be great.

Thanks!

shijin-aws

Thanks!

shijin-aws · 2025-01-22T19:38:05Z

@j-xiong is intel ci failure related?

j-xiong · 2025-01-22T20:03:31Z

@shijin-aws We are having slurm issues right now. The tests didn't actually run.

shijin-aws · 2025-01-23T18:44:12Z

@j-xiong is Intel CI still having issues?

j-xiong · 2025-01-23T18:50:13Z

There are two intermittent failures with oneCCL over tcp provider. They are unrelated to this PR.

aws-nslick · 2025-01-23T23:15:44Z

prov/cxi/src/cxip_iomm.c

@@ -575,6 +589,9 @@ static void cxip_unmap_nocache(struct cxip_md *md)
 {
 	int ret;

+	if (md->dmabuf_fd_valid)
+		ofi_hmem_put_dmabuf_fd(md->info.iface, md->dmabuf_fd);


I'd be curious to know why cxi requires that the close of the dmabuf is deferred until point of unmapping it? Why can't you just close it immediately after you do the map? I thought this was a property of the dmabuf kernel interface and not something that would be different between providers, but I don't know this code well and might be missing something.

I assume the kernel driver would reference count properly so that closing right after mapping should work just fine. I don't see the drawback of doing either way.

iziemba added 2 commits January 20, 2025 10:09

src/hmem: Define ofi_hmem_put_dmabuf_fd

e540e80

For some HMEM ifaces, ofi_hmem_get_dmabuf_fd() may result in a new FD being allocated. Define ofi_hmem_put_dmabuf_fd() to close FD. Signed-off-by: Ian Ziemba <[email protected]>

hmem/rocr: Support ofi_hmem_put_dmabuf_fd()

7c74473

With ROCR, callers of ofi_hmem_get_dmabuf_fd() should call ofi_hmem_put_dmabuf_fd() once the DMA buf region is no longer used. Signed-off-by: Ian Ziemba <[email protected]>

iziemba requested review from swelch and j-xiong January 22, 2025 04:32

shijin-aws reviewed Jan 22, 2025

View reviewed changes

iziemba added 5 commits January 22, 2025 09:14

hmem/cuda: Support ofi_hmem_put_dmabuf_fd()

c93c257

With CUDA, callers of ofi_hmem_get_dmabuf_fd() should call ofi_hmem_put_dmabuf_fd() once the DMA buf region is no longer used. Signed-off-by: Ian Ziemba <[email protected]>

prov/cxi: Integrate with ofi_hmem_put_dmabuf_fd

50d7a22

Signed-off-by: Ian Ziemba <[email protected]>

prov/cxi: Test ROCR with DMA buf offset

d1e73d0

Performing multiple HSA allocations appears to result in a DMA buf offset. Verify that the CXI provider can register a DMA buf offset memory region. Signed-off-by: Ian Ziemba <[email protected]>

prov/cxi: Test ROCR with DMA buf FD recycling

fb19b96

When a MR is freed, the CXI provider should free the DMA buf FD used for the ROCR region. Failing to do this will result in FDs being exhausted. Signed-off-by: Ian Ziemba <[email protected]>

prov/cxi: Test CUDA with DMA buf FD recycling

d28282f

When a MR is freed, the CXI provider should free the DMA buf FD used for the CUDA region. Failing to do this will result in FDs being exhausted. Signed-off-by: Ian Ziemba <[email protected]>

iziemba force-pushed the hmem_update branch from 5561d9a to d28282f Compare January 22, 2025 15:16

iziemba mentioned this pull request Jan 22, 2025

prov/efa: fix leak of dmabuf fd in cuda p2p probe #10708

Merged

iziemba requested a review from shijin-aws January 22, 2025 15:25

swelch approved these changes Jan 22, 2025

View reviewed changes

j-xiong approved these changes Jan 22, 2025

View reviewed changes

shijin-aws approved these changes Jan 22, 2025

View reviewed changes

j-xiong merged commit ba880cc into ofiwg:main Jan 23, 2025
12 of 13 checks passed

aws-nslick reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hmem: Define ofi_hmem_put_dmabuf_fd #10716

hmem: Define ofi_hmem_put_dmabuf_fd #10716

iziemba commented Jan 22, 2025

shijin-aws commented Jan 22, 2025 •

edited

Loading

shijin-aws Jan 22, 2025

iziemba Jan 22, 2025

aws-nslick Jan 23, 2025

j-xiong Jan 23, 2025 •

edited

Loading

iziemba commented Jan 22, 2025

shijin-aws left a comment

shijin-aws commented Jan 22, 2025

j-xiong commented Jan 22, 2025

shijin-aws commented Jan 23, 2025

j-xiong commented Jan 23, 2025

aws-nslick Jan 23, 2025

j-xiong Jan 23, 2025

hmem: Define ofi_hmem_put_dmabuf_fd #10716

hmem: Define ofi_hmem_put_dmabuf_fd #10716

Conversation

iziemba commented Jan 22, 2025

shijin-aws commented Jan 22, 2025 • edited Loading

shijin-aws Jan 22, 2025

Choose a reason for hiding this comment

iziemba Jan 22, 2025

Choose a reason for hiding this comment

aws-nslick Jan 23, 2025

Choose a reason for hiding this comment

j-xiong Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

iziemba commented Jan 22, 2025

shijin-aws left a comment

Choose a reason for hiding this comment

shijin-aws commented Jan 22, 2025

j-xiong commented Jan 22, 2025

shijin-aws commented Jan 23, 2025

j-xiong commented Jan 23, 2025

aws-nslick Jan 23, 2025

Choose a reason for hiding this comment

j-xiong Jan 23, 2025

Choose a reason for hiding this comment

shijin-aws commented Jan 22, 2025 •

edited

Loading

j-xiong Jan 23, 2025 •

edited

Loading