Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Concurrency, support long trajectories, Minor fixes #100

Open
wants to merge 104 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
c3acc58
Working with CUDA11 without warnings
Jun 28, 2021
f471098
Free memory issues and update tests
Jun 29, 2021
63d3664
Add Clear Memory
Jun 30, 2021
ea1722e
Fix memory leaks for python operator
Jun 30, 2021
283ed59
Rename and do right DC
Jul 5, 2021
4a5596f
Minor version bump
Jul 5, 2021
71c2465
Fix memory
Nov 9, 2021
1bd12d9
Compile issue
Nov 9, 2021
5c50bf4
clean mem on gpu
Nov 9, 2021
7ffefeb
Remove
Nov 9, 2021
432a706
Version bump
Nov 9, 2021
723f8ee
Merge pull request #3 from chaithyagr/cpu_leaks
chaithyagr Nov 9, 2021
2e59fba
Fix 2D issues
Nov 15, 2021
6af4eaf
Merge pull request #4 from chaithyagr/cpu_leaks
chaithyagr Nov 15, 2021
471d4e8
Remove linking issues
Nov 15, 2021
9e2bd11
Version bump
Nov 15, 2021
a175ef8
Update in code, added concurency and async copy
Nov 16, 2021
91d8129
Update in code, added concurency and async copy
Nov 17, 2021
b718f00
Concurency working codes
Nov 17, 2021
27abe64
Fix leaks
Nov 17, 2021
8dd394a
Do a single cudaMalloc
Nov 18, 2021
07d0cab
New test
Nov 18, 2021
15db29d
Fixed all issues, no copies
Nov 18, 2021
99d03ad
FIx leaks
Nov 19, 2021
8d8f248
Fix mem leaks
Nov 19, 2021
10663bf
test file
Nov 25, 2021
d324975
Fix minute issues
Nov 25, 2021
ee3af68
Fix for 2D
Jan 11, 2022
bc16ccf
Fix
Apr 22, 2022
72bf537
fixed
Apr 25, 2022
afc995c
Merge branch 'master' into concurrency
chaithyagr Jul 20, 2023
263450f
Merge pull request #5 from chaithyagr/concurrency
chaithyagr Jul 20, 2023
23c43cf
Merge pull request #6 from chaithyagr/fix_long
chaithyagr Jul 20, 2023
f70430e
Merged to get 11.6 fixes
Jul 20, 2023
f8b2ccb
Fixes for missing directories, add CUDA DIR expliocitly
Jul 20, 2023
357e548
Update versioning to take concurrency into account
chaithyagr Jul 20, 2023
5ca38c9
fix: allow for non-integer osf.
paquiteau Nov 5, 2023
417e1c6
Add pinned memory stuff first code, with debug prints
Nov 15, 2023
e3cdb90
All setup!
Nov 15, 2023
31e44c9
Fix cmake link cuda
Nov 15, 2023
f524864
\n
Nov 15, 2023
4ff0c1a
mapped
Nov 15, 2023
1daf1c5
mapped
Nov 15, 2023
caf48ac
Fix for memory type
Nov 15, 2023
59545c9
Fix pointers pointers
Nov 15, 2023
dbca20e
Added pinned stuff
Nov 17, 2023
77c82e3
Merge pull request #9 from chaithyagr/check_pin
chaithyagr Nov 17, 2023
837bcd0
feat: add fully on-gpu density compensation estimation.
paquiteau Nov 20, 2023
ae7a1e3
feat: add power method estimation of the spectral radius.
paquiteau Nov 20, 2023
d2bcf7c
Working codes for mem allocations
Nov 20, 2023
11ef27d
Working added additional optional input
Nov 20, 2023
b7cf9b8
Completed coding the entire end to end
Nov 20, 2023
0d3dcf5
Completed coding the entire end to end
Nov 20, 2023
3f73176
Added codes
Nov 20, 2023
04b4207
Fixes
Nov 20, 2023
7d17703
fixes
Nov 20, 2023
08059e7
major fixes
Nov 20, 2023
10d4618
Fix density comp
Nov 20, 2023
e656fc4
All fixes
Nov 20, 2023
4d993b1
Fix free
Nov 20, 2023
57d0160
added tests
Nov 20, 2023
2124693
Temp fixes
Nov 21, 2023
e9c9fef
Test
Nov 21, 2023
38cefe9
Added memory warnings
Nov 21, 2023
56298f8
Added additional tests, just before removing all options!
Nov 21, 2023
436a875
Added additional tests, just before removing all options!
Nov 21, 2023
a554926
Working with warnings
Nov 21, 2023
c71792c
Commited
Nov 21, 2023
7cdeaff
Added codes
Nov 21, 2023
8e3bb20
Added to stderr
Nov 21, 2023
6b2ef10
Fixes for smaps
Nov 28, 2023
e31085f
Update gpuNUFFT_operator_python_factory.cpp
chaithyagr Nov 29, 2023
7bc02a4
Merge pull request #14 from chaithyagr/mem_efficient
chaithyagr Jan 5, 2024
4517fcb
Update setup.py
chaithyagr Jan 5, 2024
c096913
version bump
Jan 5, 2024
9441b4d
Merge branch 'master' into float-osf
paquiteau Jan 10, 2024
a8e2b25
Update setup.py
chaithyagr Jan 10, 2024
611790a
Fix setup
Jan 10, 2024
a923e5f
Merge pull request #11 from paquiteau/float-osf
chaithyagr Feb 12, 2024
81f66f8
Merge branch 'master' into spectral-rad
chaithyagr Feb 12, 2024
95560cf
Merge pull request #13 from paquiteau/spectral-rad
chaithyagr Feb 12, 2024
134406e
Merge branch 'master' into faster-density
chaithyagr Feb 12, 2024
3d0b2c8
Merge pull request #12 from paquiteau/faster-density
chaithyagr Feb 12, 2024
f38ee69
Working with fixed python Lib
Feb 12, 2024
06e10b4
FIXED Dens
Feb 12, 2024
5351ac7
Working built: GPU and CPU both present
Feb 13, 2024
9f31c78
Final fixes
Feb 15, 2024
33935fe
Merge pull request #17 from chaithyagr/gpu
chaithyagr Feb 15, 2024
0bb69c1
Add gpuNUFFT version pop[
Feb 15, 2024
06dd0b6
Fix cuRAND
Feb 15, 2024
611f56e
Fix issues
Feb 19, 2024
e86dbc6
Version bump
Feb 19, 2024
7d097e3
Fixes added
Feb 21, 2024
0c34013
Update with final fixes, v0.7.5
Feb 21, 2024
01b9cab
Autograd support added
chaithyagr Jun 3, 2024
69264c9
Merge pull request #20 from chaithyagr/isign
chaithyagr Jun 6, 2024
6b66ee0
Added support for set_pts
chaithyagr Jun 21, 2024
0f45d42
Merge pull request #22 from chaithyagr/set_pts
chaithyagr Jun 21, 2024
2fde394
commit
chaithyagr Aug 2, 2024
3d9809c
WIP debug
chaithyagr Aug 2, 2024
0dc9687
A bunch of fixes to support CUDA12.0
chaithyagr Aug 5, 2024
2de0511
Merge pull request #23 from chaithyagr/release_temp
chaithyagr Aug 5, 2024
5e267bb
Fix memory leak
chaithyagr Aug 29, 2024
a8c0838
Merge pull request #24 from chaithyagr/memleak
chaithyagr Sep 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions CUDA/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,15 @@ endif(GPU_DOUBLE_PREC)

SET(FERMI_GPU OFF CACHE BOOL "Enable build for (old) Fermi architectures (Compute capability 2.0)")


set(MY_NVCC_FLAGS -I${CUDA_INCLUDE_DIRS})
set(CMAKE_CXX_FLAGS -I${CUDA_INCLUDE_DIRS})
IF(FERMI_GPU)
set(MY_NVCC_FLAGS -gencode arch=compute_30,code=sm_30)
list(APPEND MY_NVCC_FLAGS -gencode arch=compute_30,code=sm_30)
list(APPEND MY_NVCC_FLAGS -gencode arch=compute_50,code=sm_50)
list(APPEND MY_NVCC_FLAGS -gencode=arch=compute_52,code=sm_52)
list(APPEND MY_NVCC_FLAGS -gencode=arch=compute_52,code=compute_52)
list(APPEND MY_NVCC_FLAGS -gencode arch=compute_50,code=sm_50)
ELSE(FERMI_GPU)
set(MY_NVCC_FLAGS -gencode arch=compute_50,code=sm_50)
list(APPEND MY_NVCC_FLAGS -gencode=arch=compute_52,code=sm_52)
list(APPEND MY_NVCC_FLAGS -gencode=arch=compute_52,code=compute_52)

Expand All @@ -126,7 +127,7 @@ ENDIF(FERMI_GPU)

IF(CMAKE_BUILD_TYPE MATCHES Debug)
MESSAGE("debug mode")
list(APPEND CUDA_NVCC_FLAGS ${MY_NVCC_FLAGS} --ptxas-options=-v)
list(APPEND CUDA_NVCC_FLAGS ${MY_NVCC_FLAGS} --ptxas-options=-v -G)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -g -std=c++11")
ELSE(CMAKE_BUILD_TYPE)
list(APPEND CUDA_NVCC_FLAGS ${MY_NVCC_FLAGS})
Expand All @@ -150,6 +151,7 @@ CONFIGURE_FILE( ${CMAKE_SOURCE_DIR}/inc/cufft_config.hpp.cmake ${CMAKE_SOURCE_DI

#Include dirs
include_directories(inc)
message(CUDA_INCLUDE_DIRS : ${CUDA_INCLUDE_DIRS})
SET(GPUNUFFT_INC_DIR ${CMAKE_SOURCE_DIR}/inc)
SET(GPUNUFFT_INCLUDE ${GPUNUFFT_INC_DIR}/cuda_utils.hpp
${GPUNUFFT_INC_DIR}/cuda_utils.cuh
Expand All @@ -161,10 +163,8 @@ SET(GPUNUFFT_INCLUDE ${GPUNUFFT_INC_DIR}/cuda_utils.hpp
${GPUNUFFT_INC_DIR}/precomp_utils.hpp
${GPUNUFFT_INC_DIR}/gpuNUFFT_operator.hpp
${GPUNUFFT_INC_DIR}/balanced_operator.hpp
${GPUNUFFT_INC_DIR}/texture_gpuNUFFT_operator.hpp
${GPUNUFFT_INC_DIR}/balanced_gpuNUFFT_operator.hpp
${GPUNUFFT_INC_DIR}/gpuNUFFT_operator_factory.hpp
${GPUNUFFT_INC_DIR}/balanced_texture_gpuNUFFT_operator.hpp)
${GPUNUFFT_INC_DIR}/gpuNUFFT_operator_factory.hpp)

SET(MATLAB_HELPER_INCLUDE ${GPUNUFFT_INC_DIR}/matlab_helper.h)
SET(CONFIG_INCLUDE ${GPUNUFFT_INC_DIR}/config.hpp ${GPUNUFFT_INC_DIR}/cufft_config.hpp)
Expand Down
1 change: 0 additions & 1 deletion CUDA/inc/balanced_gpuNUFFT_operator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ class BalancedGpuNUFFTOperator : public GpuNUFFTOperator,

~BalancedGpuNUFFTOperator()
{
if (!matlabSharedMem)
freeLocalMemberArray(this->sectorProcessingOrder.data);
}

Expand Down
92 changes: 0 additions & 92 deletions CUDA/inc/balanced_texture_gpuNUFFT_operator.hpp

This file was deleted.

4 changes: 2 additions & 2 deletions CUDA/inc/config.hpp.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@
typedef cufftComplex CufftType;
#endif

typedef unsigned int SizeType;
typedef unsigned int IndType;
typedef unsigned long int SizeType;
typedef unsigned long int IndType;
typedef uint2 IndType2;
typedef uint3 IndType3;

Expand Down
64 changes: 0 additions & 64 deletions CUDA/inc/cuda_utils.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -7,70 +7,6 @@ __constant__ gpuNUFFT::GpuNUFFTInfo GI;

__constant__ DType KERNEL[10000];

texture<float, 1, cudaReadModeElementType> texKERNEL;
texture<float, 2, cudaReadModeElementType> texKERNEL2D;
texture<float, 3, cudaReadModeElementType> texKERNEL3D;

texture<float2> texDATA;
texture<cufftComplex> texGDATA;

__inline__ __device__ float compute1DTextureLookup(float x, float y)
{
return tex1D(texKERNEL, x) * tex1D(texKERNEL, y);
}

__inline__ __device__ float compute1DTextureLookup(float x, float y, float z)
{
return tex1D(texKERNEL, x) * tex1D(texKERNEL, y) * tex1D(texKERNEL, z);
}

__inline__ __device__ float compute2DTextureLookup(float x, float y)
{
return (float)tex2D(texKERNEL2D, (float)x, (float)y);
}

__inline__ __device__ float compute2DTextureLookup(float x, float y, float z)
{
return (float)tex2D(texKERNEL2D, (float)x, (float)y) *
tex2D(texKERNEL2D, (float)z, 0);
}

__inline__ __device__ float compute3DTextureLookup(float x, float y)
{
return tex3D(texKERNEL3D, x, y, 0);
}

__inline__ __device__ float compute3DTextureLookup(float x, float y, float z)
{
return tex3D(texKERNEL3D, x, y, z);
}

__inline__ __device__ float computeTextureLookup(float x, float y)
{
// wired to 2d
return compute2DTextureLookup((float)x, (float)y);
// switch(GI.interpolationType)
//{
// case 1: return compute1DTextureLookup(x,y);
// case 2: return compute2DTextureLookup(x,y);
// case 3: return compute3DTextureLookup(x,y);
// default: return (float)0.0;
//}
}

__inline__ __device__ float computeTextureLookup(float x, float y, float z)
{
// wired to 2d
return compute2DTextureLookup(x, y, z);
// switch(GI.interpolationType)
//{
// case 1: return compute1DTextureLookup(x,y,z);
// case 2: return compute2DTextureLookup(x,y,z);
// case 3: return compute3DTextureLookup(x,y,z);
// default: return (float)0.0;
//}
}

#if __CUDA_ARCH__ < 200
#define THREAD_BLOCK_SIZE 256
#else
Expand Down
82 changes: 44 additions & 38 deletions CUDA/inc/cuda_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,19 @@ inline void copyToDevice(TypeName *host_ptr, TypeName *device_ptr,
HANDLE_ERROR(cudaMemcpy(device_ptr, host_ptr, num_elements * sizeof(TypeName),
cudaMemcpyHostToDevice));
}

/** \brief CUDA memcpy call to copy data from host to device
*
* @param host_ptr host data pointer
* @param device_ptr device pointer
* @param num_elements amount of elements of size TypeName
*/
template <typename TypeName>
inline void copyToDeviceAsync(TypeName *host_ptr, TypeName *device_ptr,
IndType num_elements, cudaStream_t stream=0)
{
HANDLE_ERROR(cudaMemcpyAsync(device_ptr, host_ptr, num_elements * sizeof(TypeName),
cudaMemcpyHostToDevice, stream));
}
/** \brief CUDA memory allocation and memcpy call to copy data from host to
*device
*
Expand Down Expand Up @@ -98,13 +110,28 @@ inline void allocateAndSetMem(TypeName **device_ptr, IndType num_elements,
*/
template <typename TypeName>
inline void copyDeviceToDevice(TypeName *device_ptr_src,
TypeName *device_ptr_dest, IndType num_elements)
TypeName *device_ptr_dest, IndType num_elements
)
{
HANDLE_ERROR(cudaMemcpy(device_ptr_dest, device_ptr_src,
num_elements * sizeof(TypeName),
cudaMemcpyDeviceToDevice));
}

/** \brief CUDA memcpy call to copy data from device ptr to device ptr
*
* @param device_ptr_src source device pointer
* @param device_ptr_dest destination device pointer
* @param num_elements amount of elements of size TypeName
*/
template <typename TypeName>
inline void copyDeviceToDeviceAsync(TypeName *device_ptr_src,
TypeName *device_ptr_dest, IndType num_elements, cudaStream_t stream=0)
{
HANDLE_ERROR(cudaMemcpyAsync(device_ptr_dest, device_ptr_src,
num_elements * sizeof(TypeName),
cudaMemcpyDeviceToDevice, stream));
}
/** \brief Copy CUDA memory from device to host
*
* @param device_ptr device pointer
Expand All @@ -118,7 +145,19 @@ inline void copyFromDevice(TypeName *device_ptr, TypeName *host_ptr,
HANDLE_ERROR(cudaMemcpy(host_ptr, device_ptr, num_elements * sizeof(TypeName),
cudaMemcpyDeviceToHost));
}

/** \brief Copy CUDA memory from device to host
*
* @param device_ptr device pointer
* @param host_ptr host pointer
* @param num_elements amount of elements of size TypeName
*/
template <typename TypeName>
inline void copyFromDeviceAsync(TypeName *device_ptr, TypeName *host_ptr,
IndType num_elements, cudaStream_t stream=0)
{
HANDLE_ERROR(cudaMemcpyAsync(host_ptr, device_ptr, num_elements * sizeof(TypeName),
cudaMemcpyDeviceToHost, stream));
}
/** \brief Free variable list of device pointers. Use NULL as stopping element
*
* e.g.: freeTotalDeviceMemory(ptr1*, ptr2*,NULL);
Expand Down Expand Up @@ -181,7 +220,7 @@ inline void showMemoryInfo(bool force, FILE *stream)
size_t total_mem = 0;
cudaMemGetInfo(&free_mem, &total_mem);
if (DEBUG || force)
fprintf(stream, "memory usage, free: %lu total: %lu\n", (SizeType)(free_mem),
printf("memory usage, free: %lu total: %lu\n", (SizeType)(free_mem),
(SizeType)(total_mem));
}

Expand Down Expand Up @@ -212,39 +251,6 @@ inline void showMemoryInfo()
*
* @param symbol Const symbol name
*/
void initConstSymbol(const char *symbol, const void *src, IndType count);

/** \brief Initialize texture memory on device
*
* CUDA Kernel function prototype.
*
* @param symbol Texture symbol name
*/
void initTexture(const char *symbol, cudaArray **devicePtr,
gpuNUFFT::Array<DType> hostTexture);

/** \brief Bind to 1-d texture on device
*
* CUDA Kernel function prototype.
*
* @param symbol Texture symbol name
*/
void bindTo1DTexture(const char *symbol, void *devicePtr, IndType count);

/** \brief Unbind from device texture
*
* CUDA Kernel function prototype.
*
* @param symbol Texture symbol name
*/
void unbindTexture(const char *symbol);

/** \brief Free texture memory on device
*
* CUDA Kernel function prototype.
*
* @param symbol Texture symbol name
*/
void freeTexture(const char *symbol, cudaArray *devicePtr);
void initConstSymbol(const char *symbol, const void *src, IndType count, cudaStream_t stream=0);

#endif
Loading