Running SR on a distributed cluster #644
-
We have been SymbolicRegression.jl for our research but have hit a point where the equation search takes too long on a single compute node (with ~16 cores or so). We're now looking into using our distributed computing resources (we have both an MPI cluster as well as a slurm HPC cluster available) and were wondering if you have used SR in such an environment before or might know someone who has? we're hoping to not having to re-invent the wheel for writing the entire orchestration code (copy from slack DM) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hey Christian, Yes, I use SR like this in my own work. Basically if you just set
I usually do the 2nd out of convenience Just be sure to launch SR only once, on a single node, from a single task on the slurm job. ClusterManagers.jl will run srun internally for you. |
Beta Was this translation helpful? Give feedback.
Hey Christian,
Yes, I use SR like this in my own work. Basically if you just set
parallelism=:multiprocessing
then you can either:numprocs=num_nodes * num_cores, addprocs_function=addprocs_slurm
, and SR.jl will try its best to set it up for youI usually do the 2nd out of convenience
Just be sure to launch SR only once, on a single node, from a single task on the slurm job. ClusterManagers.jl will run srun internally for you.