Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The se_a descriptor cannot utilize all cores when trained on the cpu #4474

Open
gengxingze opened this issue Dec 17, 2024 · 5 comments
Open
Labels

Comments

@gengxingze
Copy link

gengxingze commented Dec 17, 2024

Summary

When training the dp_sea model on a 64core cpu-only node, the cpu utilization is only about 1800%. How to set it up to further increase the cpu utilization?

DeePMD-kit Version

2.2.10

Backend and its version

v2

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

python=3.10.13

Details

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
34016 gengxz 20 0 16.1g 636216 102264 S 1779 0.2 1268:54 python

Image

@njzjz
Copy link
Member

njzjz commented Dec 17, 2024

the cpu utilization is only about 180%.

Isn't it 1800%?

@gengxingze
Copy link
Author

the cpu utilization is only about 180%.

Isn't it 1800%?

I apologize for the correction, but 1800% is still only 18 cores working.

@njzjz
Copy link
Member

njzjz commented Dec 20, 2024

In the top program, you can type 1 to see how much each process is utilized.

How do you set the threads?

@gengxingze
Copy link
Author

In the top program, you can type 1 to see how much each process is utilized.

How do you set the threads?
#!/bin/bash
#SBATCH --job-name=deepmd
#SBATCH --partition=cpu
#SBATCH --output=job.%j.out
#SBATCH --error=job.%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64

begin=date +%s
echo "======== Job starts at date +'%Y-%m-%d %T' ======== "

module load mkl mpi compiler
conda activate deepmd

dp train input.json
echo "======== Job ends at date +'%Y-%m-%d %T' ======== "
final=date +%s
total=$(($final-$begin))
day=$((total/86400))
hour=$((total%86400/3600))
minute=$((total%3600/60))
second=$((total%60))
echo totaltime: $day days, ${hour}:${minute}:${second} | tee -a out

Image

@njzjz
Copy link
Member

njzjz commented Dec 21, 2024

From the top I can see that all cores have been used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants