Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamicArray grow/shrink broken for GPU arrays #6

Open
mathijs727 opened this issue Mar 24, 2022 · 0 comments
Open

DynamicArray grow/shrink broken for GPU arrays #6

mathijs727 opened this issue Mar 24, 2022 · 0 comments

Comments

@mathijs727
Copy link
Contributor

mathijs727 commented Mar 24, 2022

Due to a bug in Memory::realloc_impl DynamicArray cannot grow/shrink if they point to GPU memory (either GPU_Malloc or GPU_Managed).
Grow/shrinking calls the Memory::realloc_impl function which checks whether the allocation contains CPU memory or GPU accessible memory.
In case of the latter cuda_memcpy_impl is called with cudaMemcpyDefault as cudaMemcpyKind.

if (is_gpu_type(oldAlloc.type)) {
        ptr = malloc_impl(newSize, oldAlloc.name, oldAlloc.type);
        if (ptr)
            cuda_memcpy_impl(static_cast<uint8*>(ptr), static_cast<uint8*>(oldPtr), oldAlloc.size, cudaMemcpyDefault);
        free_impl(oldPtr);
    } else {
        ...
    }

However, cuda_memcpy_impl does not support cudaMemcpyDefault and silently fails, causing the DynamicArray to contain junk values (whatever malloc returned):

void Memory::cuda_memcpy_impl(uint8* dst, const uint8* src, uint64 size, cudaMemcpyKind memcpyKind)
{
    const auto BlockCopy = [&]() {
        const double Start = Utils::seconds();
        CUDA_CHECKED_CALL cudaMemcpy(dst, src, size, memcpyKind);
        const double End = Utils::seconds();

        return size / double(1u << 30) / (End - Start);
    };

    if (memcpyKind == cudaMemcpyDeviceToDevice) {
        PROFILE_SCOPEF("Memcpy HtH %fMB", size / double(1u << 20));
        [[maybe_unused]] const double Bandwidth = BlockCopy();
        ZONE_METADATA("%fGB/s", Bandwidth);
    } else if (memcpyKind == cudaMemcpyDeviceToHost) {
        PROFILE_SCOPEF("Memcpy DtH %fMB", size / double(1u << 20));
        [[maybe_unused]] const double Bandwidth = BlockCopy();
        ZONE_METADATA("%fGB/s", Bandwidth);
    } else if (memcpyKind == cudaMemcpyHostToDevice) {
        PROFILE_SCOPEF("Memcpy HtD %fMB", size / double(1u << 20));
        [[maybe_unused]] const double Bandwidth = BlockCopy();
        ZONE_METADATA("%fGB/s", Bandwidth);
    }
}

This could be fixed by adding an extra else if statement to cuda_memcpy_impl to handle the cudaMemcpyDefault case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant