This repository reimplements the line/plane odometry (based on LOAM) of LIO-SAM with CUDA. Replacing pcl's kdtree, a point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.
Modifications are as follow :
- The CUDA codes of the line/plane odometry are in src/cuda_plane_line_odometry.
- To use this CUDA odometry, the scan2MapOptimization() in mapOptimization.cpp is replaced with scan2MapOptimizationWithCUDA().
This repository reimplements the line/plane odometry in scan2MapOptimization() of mapOptimization.cpp with CUDA. The most significant cost of the original implementation is the 5-neighbour KNN search using pcl's kdtree, which, on my machine (intel i7-6700k CPU, walking_dataset.bag, with OpenMP), usually takes about 5ms. This repository replaces pcl's kdtree with a point cloud hash map (inspired by iVox of Faster-LIO) implemented with CUDA. On my machine (Nvidia 980TI CPU, walking_dataset.bag), average cost of the 5-neighbour KNN search is down to about 0.5~0.6ms, average cost of all operations in one frame is down to about 11ms. Meanwhile, other parts of the line/plane odometry (jacobians & residuals etc) are also implemented with CUDA.
The essential dependencies are as same as LIO-SAM. In addition, the CUDA reimplementation of the line/plane odometry requires :
Before build this repo, some CMAKE variables in src/cuda_plane_line_odometry/CMakeLists.txt need to be modified to fit your enviroment :
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc) # change it to your path to nvcc
set(CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda/bin/nvcc) # change it to your path to nvcc
set(CMAKE_CUDA_ARCHITECTURES 52) # for example, if your device's compute capability is 6.2, then set this CMAKE variable to 62
The basic steps to compile and run this repo is as same as LIO-SAM.
Sequence | CPU (Intel I7-6700K) | GPU (Nvidia 980TI) | |||
---|---|---|---|---|---|
build kdtree | one frame (build kdtree & all iteraions) | build hashmap | one KNN | one frame (build hashmap & all iteraions) |
|
Walking | 16.06ms no RVIZ 29.00ms with RVIZ | 49.98ms no RVIZ 84.20ms with RVIZ | 4.52ms no RVIZ 6.93ms with RVIZ | 0.57ms no RVIZ 0.58ms with RVIZ | 11.06ms no RVIZ 15.68ms with RVIZ |
Park | 16.11ms no RVIZ 28.08ms with RVIZ | 59.02ms no RVIZ 101.38ms with RVIZ | 4.18ms no RVIZ 6.71ms with RVIZ | 0.62ms no RVIZ 0.62ms with RVIZ | 11.41ms no RVIZ 16.55ms with RVIZ |
Garden | 17.66ms no RVIZ 31.71ms with RVIZ | 53.40ms no RVIZ 84.24ms with RVIZ | 5.01ms no RVIZ 7.43ms with RVIZ | 0.60ms no RVIZ 0.61ms with RVIZ | 11.42ms no RVIZ 15.66ms with RVIZ |
Rooftop | 17.48ms no RVIZ 36.78ms with RVIZ | 67.81ms no RVIZ 120.75ms with RVIZ | 4.96ms no RVIZ 8.30ms with RVIZ | 0.81ms no RVIZ 0.82ms with RVIZ | 13.63ms no RVIZ 19.86ms with RVIZ |
Rotation | 11.01ms no RVIZ 10.80ms with RVIZ | 50.30ms no RVIZ 53.15ms with RVIZ | 4.01ms no RVIZ 4.40ms with RVIZ | 0.54ms no RVIZ 0.55ms with RVIZ | 9.77ms no RVIZ 10.27ms with RVIZ |
Campus (small) | 17.88ms no RVIZ 37.30ms with RVIZ | 58.68ms no RVIZ 115.68ms with RVIZ | 4.70ms no RVIZ 7.62ms with RVIZ | 0.60ms no RVIZ 0.62ms with RVIZ | 11.89ms no RVIZ 17.83ms with RVIZ |
Campus (large) | 16.20ms no RVIZ 28.39ms with RVIZ | 60.67ms no RVIZ 108.08ms with RVIZ | 4.76ms no RVIZ 7.50ms with RVIZ | 0.62ms no RVIZ 0.63ms with RVIZ | 12.48ms no RVIZ 17.47ms with RVIZ |
2011_09_30_drive_0028 | 14.33ms no RVIZ 22.25ms with RVIZ | 110.22ms no RVIZ 168.98ms with RVIZ | 5.20ms no RVIZ 7.44ms with RVIZ | 1.05ms no RVIZ 1.05ms with RVIZ | 19.64ms no RVIZ 24.50ms with RVIZ |
P.S. It seems that RVIZ will largely slow down the speed of this reimplementation. For example, with RVIZ running, average cost of one frame in walking dataset is dragged to about 15.56ms.
This repository is a modified version of LIO-SAM, whose line/plane odometry is originally based upon LOAM.
The point cloud hash map on GPU is inspired by iVox data structure of Faster-LIO, and draws experience from kdtree_cuda_builder.h of FLANN.