Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] introduce CUDA managed memory and use it for a matching function #157

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

griwodz
Copy link
Member

@griwodz griwodz commented Jul 28, 2024

Description

Introduce the use of CUDA Managed Memory (CMM) and use it in a new feature matching function.

The main change of this PR adds the malloc_mgd and free_mgd functions and the malloc_mgdT template.
As a secondary change, add a new feature matching function that uses CMM.
The demo program popsift-match has been changed to use the new function, making it easy to print results on screen.

Features list

  • add functions malloc_mgd and free_mgd
  • add template malloc_mgdT
  • use managed memory to store extracted features
  • add FeaturesDev::matchAndReturn, a new matching function that use CMM
  • change demo program popsift-match to use FeaturesDev::matchAndReturn and print results on screen

Implementation remarks

CMM allows the programmer to allocate flat 1D memory that is accessible for both the CPU and the GPU. The CUDA device driver guesses on which side the memory is needed next, and performs the transfer in the background. On devices where CPU and GPU share physical memory, this is even better because memory copies can be avoided altogether. Using CMM started to make sense with CUDA CC 6.0 ("Pascal").

Using CMM is purportedly safe, but that is not true. If the programmer doesn't keep track of the side that control the memory at any time, there will be race conditions. On the NVidia Tegra, a shared memory architecture, we have seen race conditions spanning several allocated memory regions when those regions are so small that they fit into the same memory page, e.g. control structures. The simple way of preventing race conditions is cudaDeviceSynchronize. It is more efficient to use cudaMemAdvise to tell the driver that the CPU uses it next (don't forget to unset the location hint after the CPU is finished using the memory!), and cudaMemPrefetchAsync to inform the driver about the specific stream on the GPU that will need the memory (the other streams don't have to wait for it).

@griwodz griwodz self-assigned this Jul 28, 2024
@griwodz griwodz added type:feature cuda issues related to cuda versions labels Jul 28, 2024
@griwodz griwodz changed the title introduce CUDA managed memory and use it for a matching function [WIP] introduce CUDA managed memory and use it for a matching function Jul 28, 2024
@griwodz griwodz marked this pull request as draft August 12, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda issues related to cuda versions in progress type:feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant