[Question] Number of task with MPI applications #859

FNTwin · 2024-12-03T16:11:38Z

Hi, it is me again (back from the holidays) to ask some other questions about QCFractal. I am transitioning from psi4 to use orca 6.0.1 and I wrote a QCEngine implementation (that I will open source soon with a PR soon) to use it in the infrastructure.
The serial application work flawlessly but I am trying to use the parallel feature of ORCA and I encountered some difficulties in the implementation.

I made some modification to qcfractal and the underlying parsl to use the HighThroughputExecutor with the SlurmProvider to allow jobs with a higher number of tasks. As an example a config of the type:

workers_per_node: 2  
max_nodes: 1
cores_per_worker:  4      
memory_per_worker: 16

will spawn instead of the usual:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

export PARSL_CORES=8

the modified:

#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1

export PARSL_CORES=4

with ORCA automatically configured to use 4 CPUs per calculation.
This is working correctly for single nodes but there are some conflicts with the ComputeManager and the number of active tasks and open slots.

It seems that because of this, parsl/ComputeManager when spawning another nodes because of the high number of available open slots, there is some overallocating extra tasks on the cpus of the nodes causing 4 calculations/tasks instead of 2 per node. I am diving deeper into parsl and the use of it in the manager but for now it has been tricky to solve this issue. Do you have any recommandation?

I also tried to fix the maximum number of open slots to avoid the overallocating but in this case, there is never a scaling to request new nodes and I am usually stuck to 1 node.

The text was updated successfully, but these errors were encountered:

bennybp · 2024-12-06T01:02:05Z

I'm not entirely sure, but the current manager is not particularly MPI friendly, as you are finding out :) . We do assume that the number of slots is num_nodes * ntasks_per_node, but it sounds like you want to do (non-hybrid?)-MPI.

I will have to think if there's an easy way to shove this into the manager. And I need to consult the Parsl docs as well. I would certainly be interested if this could be done in a backwards-compatible way.

Let me know if you think of anything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Number of task with MPI applications #859

[Question] Number of task with MPI applications #859

FNTwin commented Dec 3, 2024

bennybp commented Dec 6, 2024

[Question] Number of task with MPI applications #859

[Question] Number of task with MPI applications #859

Comments

FNTwin commented Dec 3, 2024

bennybp commented Dec 6, 2024