Skip to content

Running SR on a distributed cluster #644

Answered by MilesCranmer
chrhck asked this question in Q&A
Discussion options

You must be logged in to vote

Hey Christian,

Yes, I use SR like this in my own work. Basically if you just set parallelism=:multiprocessing then you can either:

  1. Pass the process objects explicitly to the procs parameter (whether those procs are on the same node, or multiple nodes, etc.)
  2. Or, set, for example, numprocs=num_nodes * num_cores, addprocs_function=addprocs_slurm , and SR.jl will try its best to set it up for you

I usually do the 2nd out of convenience

Just be sure to launch SR only once, on a single node, from a single task on the slurm job. ClusterManagers.jl will run srun internally for you.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@chrhck
Comment options

@MilesCranmer
Comment options

@chrhck
Comment options

@MilesCranmer
Comment options

Answer selected by chrhck
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants