Add multi-threaded memcpy to kvikIO #456

GregoryKimball · 2024-09-04T16:32:22Z

Currently kvikIO is focused on file IO, but sometimes we ingest pageable host buffers that include the full byte range of a binary format like Parquet, or a pageable host buffer that includes text data in CSV/JSON format. In libcudf we ingest this data using a HOST_BUFFER data source and a single threaded pageable HtoD memcpy. Instead, we should use a multithreaded memcpy through small pinned host buffers.

We may add extensions to this feature to cover DtoH transfers as well.

When reading from a pageable host buffer data source on x86-H100, libcudf performs a single-threaded Memcpy HtoD. This pattern brings poor PCIe RX utilization of ~17% and throughput of ~8 GB/s.

The text was updated successfully, but these errors were encountered:

GregoryKimball · 2024-10-15T22:13:49Z

After more research, it appears that the best way to improve PCIe utilization on pageable memcpy is likely to be with cudaHostRegister or madvise with MADV_HUGEPAGE (link). I will close this issue for now

GregoryKimball added this to libcudf Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-threaded memcpy to kvikIO #456

Add multi-threaded memcpy to kvikIO #456

GregoryKimball commented Sep 4, 2024 •

edited

Loading

GregoryKimball commented Oct 15, 2024

Add multi-threaded memcpy to kvikIO #456

Add multi-threaded memcpy to kvikIO #456

Comments

GregoryKimball commented Sep 4, 2024 • edited Loading

GregoryKimball commented Oct 15, 2024

GregoryKimball commented Sep 4, 2024 •

edited

Loading