Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Number of task with MPI applications #859

Open
FNTwin opened this issue Dec 3, 2024 · 1 comment
Open

[Question] Number of task with MPI applications #859

FNTwin opened this issue Dec 3, 2024 · 1 comment

Comments

@FNTwin
Copy link

FNTwin commented Dec 3, 2024

Hi, it is me again (back from the holidays) to ask some other questions about QCFractal. I am transitioning from psi4 to use orca 6.0.1 and I wrote a QCEngine implementation (that I will open source soon with a PR soon) to use it in the infrastructure.
The serial application work flawlessly but I am trying to use the parallel feature of ORCA and I encountered some difficulties in the implementation.

I made some modification to qcfractal and the underlying parsl to use the HighThroughputExecutor with the SlurmProvider to allow jobs with a higher number of tasks. As an example a config of the type:

workers_per_node: 2  
max_nodes: 1
cores_per_worker:  4      
memory_per_worker: 16       

will spawn instead of the usual:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

export PARSL_CORES=8

the modified:

#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1

export PARSL_CORES=4

with ORCA automatically configured to use 4 CPUs per calculation.
This is working correctly for single nodes but there are some conflicts with the ComputeManager and the number of active tasks and open slots.

It seems that because of this, parsl/ComputeManager when spawning another nodes because of the high number of available open slots, there is some overallocating extra tasks on the cpus of the nodes causing 4 calculations/tasks instead of 2 per node. I am diving deeper into parsl and the use of it in the manager but for now it has been tricky to solve this issue. Do you have any recommandation?

I also tried to fix the maximum number of open slots to avoid the overallocating but in this case, there is never a scaling to request new nodes and I am usually stuck to 1 node.

@bennybp
Copy link
Contributor

bennybp commented Dec 6, 2024

I'm not entirely sure, but the current manager is not particularly MPI friendly, as you are finding out :) . We do assume that the number of slots is num_nodes * ntasks_per_node, but it sounds like you want to do (non-hybrid?)-MPI.

I will have to think if there's an easy way to shove this into the manager. And I need to consult the Parsl docs as well. I would certainly be interested if this could be done in a backwards-compatible way.

Let me know if you think of anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants