The ray-rapids project is a simple suite of samples demonstrating how various RAPIDS GPU Accelerate Data Science libraries can be processed in a multi-GPU fashion with the use of Ray Compute Engine.
The samples contained in this project are not intended to be used as is, nor this is intended to be a library, and therefore there are no installers or packages for this repository are made available.
To get started, a Python environment with Ray and RAPIDS libraries is necessary. There are multiple ways RAPIDS is made available, including conda, pip and Docker, please refer to the RAPIDS Installation Guide for details. Here we will use conda for demonstration purposes.
We assume you have already conda available on your system, if not we suggest miniforge. You should then begin by creating a conda environment:
mamba create -n ray-rapids \
-c rapidsai-nightly -c conda-forge -c nvidia \
python=3.12 "cuda-version>=12.0,<=12.5" \
cudf=25.02 cuml=25.02 cugraph=25.02 \
ray-default
The above command assumes you will want all examples to run, including cuDF, cuML and cuGraph, but you may skip installing those that are not needed. All RAPIDS libraries require installing CUDA, conda will attempt installing the most suitable version in the range of CUDA 12.0 to CUDA 12.5, and you must make sure your system already supports CUDA 12.x. Additionally, Ray is necessary, and it is installed with the ray-default
package.
Now with the environment already available, it must be activated:
conda activate ray-rapids
There are three samples:
- ray-cudf: demonstrates cuDF for minhash computing;
- ray-kmeans: demonstrates cuML for k-means clustering;
- ray-wcc: demonstrates cuGraph for weakly connected components.
To run any of the samples, simply run them with Python:
cd rapids-experiments
python ray-cudf.py # or ray-kmeas.py, or ray-wcc.py.
Note that since there is no installer, all files rely on relative paths, and therefore you should change into the rapids-experiments
directory first as in the command above.