Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-threaded memcpy to kvikIO #456

Open
GregoryKimball opened this issue Sep 4, 2024 · 1 comment
Open

Add multi-threaded memcpy to kvikIO #456

GregoryKimball opened this issue Sep 4, 2024 · 1 comment

Comments

@GregoryKimball
Copy link

GregoryKimball commented Sep 4, 2024

Currently kvikIO is focused on file IO, but sometimes we ingest pageable host buffers that include the full byte range of a binary format like Parquet, or a pageable host buffer that includes text data in CSV/JSON format. In libcudf we ingest this data using a HOST_BUFFER data source and a single threaded pageable HtoD memcpy. Instead, we should use a multithreaded memcpy through small pinned host buffers.

We may add extensions to this feature to cover DtoH transfers as well.

When reading from a pageable host buffer data source on x86-H100, libcudf performs a single-threaded Memcpy HtoD. This pattern brings poor PCIe RX utilization of ~17% and throughput of ~8 GB/s.

Image

@GregoryKimball
Copy link
Author

After more research, it appears that the best way to improve PCIe utilization on pageable memcpy is likely to be with cudaHostRegister or madvise with MADV_HUGEPAGE (link). I will close this issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant