You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently kvikIO is focused on file IO, but sometimes we ingest pageable host buffers that include the full byte range of a binary format like Parquet, or a pageable host buffer that includes text data in CSV/JSON format. In libcudf we ingest this data using a HOST_BUFFER data source and a single threaded pageable HtoD memcpy. Instead, we should use a multithreaded memcpy through small pinned host buffers.
We may add extensions to this feature to cover DtoH transfers as well.
When reading from a pageable host buffer data source on x86-H100, libcudf performs a single-threaded Memcpy HtoD. This pattern brings poor PCIe RX utilization of ~17% and throughput of ~8 GB/s.
The text was updated successfully, but these errors were encountered:
After more research, it appears that the best way to improve PCIe utilization on pageable memcpy is likely to be with cudaHostRegister or madvise with MADV_HUGEPAGE (link). I will close this issue for now
Currently kvikIO is focused on file IO, but sometimes we ingest pageable host buffers that include the full byte range of a binary format like Parquet, or a pageable host buffer that includes text data in CSV/JSON format. In libcudf we ingest this data using a
HOST_BUFFER
data source and a single threaded pageable HtoD memcpy. Instead, we should use a multithreaded memcpy through small pinned host buffers.We may add extensions to this feature to cover DtoH transfers as well.
When reading from a pageable host buffer data source on x86-H100, libcudf performs a single-threaded Memcpy HtoD. This pattern brings poor PCIe RX utilization of ~17% and throughput of ~8 GB/s.
The text was updated successfully, but these errors were encountered: