-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamically load op library in C++ interface #1384
dynamically load op library in C++ interface #1384
Conversation
C++ interface will dynamically load OP libraries, just like Python interface, so it no longer needs linking.
Codecov Report
@@ Coverage Diff @@
## devel #1384 +/- ##
===========================================
- Coverage 75.53% 64.28% -11.26%
===========================================
Files 91 5 -86
Lines 7505 14 -7491
===========================================
- Hits 5669 9 -5660
+ Misses 1836 5 -1831 Continue to review full report at Codecov.
|
e6e296a
to
12bcccb
Compare
It is not necessary anymore.
@denghuilu please check if it works. |
An error occurs during the MD process: Not sure what's going on, I'll check it this afternoon
|
@njzjz after setting the LD_LIBRARY_PATH of $deepmd_root/lib, the MD process goes well.
|
@denghuilu Can you check if RPATH is set? |
I think rpath should have already been set here: deepmd-kit/source/lmp/env.sh.in Line 11 in b88c1da
|
|
Have no idea what's going on. Devel branch works fine. |
This comment has been minimized.
This comment has been minimized.
compiler error
|
aea3d5b
to
a3f8d95
Compare
I'll relook at it. |
@denghuilu I rechecked 0f61527 by downloading and compiling a new LAMMPS. However, I found no problem running it without setting |
@denghuilu Can you test the following command? (base) [jz748@localhost lmp]$ readelf -d /home/jz748/codes/deepmd-kit/dp/lib/libdeepmd_cc.so | head -20
As you see, |
Here's the output:
|
Checking LAMMPS?
|
|
@denghuilu set the following environment variable before running LAMMPS. export LD_DEBUG=libs It will give the following information:
We can see the search path it tries. |
|
$deepmd_root/lib is not within the search path. |
fabdac9 should fix it. |
The same error...
|
This reverts commit fabdac9.
Ok, I'll take a look... |
@denghuilu Finally I reproduce it by add Under the situation, linker will add
See https://stackoverflow.com/a/43703445/9567349 and https://stackoverflow.com/a/52020177/9567349. Adding |
Ok, I give up finding other solutions... Adding |
Nothing changed after setting the
|
@njzjz Here's my environment:
|
This flag is added by OpenMPI, see open-mpi/ompi#1089 |
If mpicxx adds the flag, I don't think we can override it though. |
This reverts commit ecccb57.
In de00e04, I call dlopen in our own library, but not use TF's function. @denghuilu I think it will also work with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my mistake. After recompiling DeeePMD-kit, everything works fine.
root lmp $ git log | head
commit de00e04206b93bf87e9c4b64a097266455ccb015
Author: Jinzhe Zeng <[email protected]>
Date: Wed Jan 19 04:52:07 2022 -0500
dlopen from dp lib but not TF
commit 3362f99b014259d25b981ad6fa04fe26e5ed3873
Author: Jinzhe Zeng <[email protected]>
Date: Wed Jan 19 04:14:55 2022 -0500
root lmp $ echo $LD_LIBRARY_PATH
/root/denghui/openmpi-4.0.6/lib:/usr/local/cuda-11.0/lib64:/root/denghui/openmpi-4.0.6/lib:/usr/local/cuda-11.0/lib64:/root/denghui/openmpi-4.0.6/lib:/usr/local/cuda-11.0/lib64:
root lmp $ mpirun --allow-run-as-root -n 1 /root/denghui/lammps/src/lmp_mpi < in.lammps
LAMMPS (29 Sep 2021)
Reading data file ...
triclinic box = (0.0000000 0.0000000 0.0000000) to (12.444700 12.444700 12.444700) with tilt (0.0000000 0.0000000 0.0000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
192 atoms
read_data CPU = 0.001 seconds
Summary of lammps deepmd module ...
>>> Info of deepmd-kit:
installed to: /root/denghui/deepmd_root
source: v2.0.2-66-gde00e04-dirty
source branch: dynamically-load-op-library
source commit: de00e04
source commit at: 2022-01-19 04:52:07 -0500
surpport model ver.:1.1
build float prec: double
build with tf inc: /root/denghui/tensorflow_root/include;/root/denghui/tensorflow_root/include
build with tf lib: /root/denghui/tensorflow_root/lib/libtensorflow_cc.so;/root/denghui/tensorflow_root/lib/libtensorflow_framework.so
set tf intra_op_parallelism_threads: 0
set tf inter_op_parallelism_threads: 0
>>> Info of lammps module:
use deepmd-kit at: /root/denghui/deepmd_root
source: v2.0.2-66-gde00e04-dirty
source branch: dynamically-load-op-library
source commit: de00e04
source commit at: 2022-01-19 04:52:07 -0500
build float prec: double
build with tf inc: /root/denghui/tensorflow_root/include;/root/denghui/tensorflow_root/include
build with tf lib: /root/denghui/tensorflow_root/lib/libtensorflow_cc.so;/root/denghui/tensorflow_root/lib/libtensorflow_framework.so
2022-01-21 09:20:11.789581: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-21 09:20:11.789987: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:11.801557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:11.802645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:12.474975: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:12.476106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:12.477162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-21 09:20:12.478219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31006 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:00:08.0, compute capability: 7.0
>>> Info of model(s):
using 1 model(s): frozen_model.pb
rcut in model: 6
ntypes in model: 2
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Neighbor list info ...
update every 10 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 8
ghost atom cutoff = 8
binsize = 4, bins = 4 4 4
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: full, newton on
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.0005
Per MPI rank memory allocation (min/avg/max) = 3.908 | 3.908 | 3.908 Mbytes
Step PotEng KinEng TotEng Temp Press Volume
0 -29944.158 8.1472669 -29936.011 330 37078.187 1927.3176
100 -29943.989 7.9877789 -29936.001 323.54004 27603.467 1927.3176
200 -29943.349 7.3418604 -29936.007 297.37751 32879.887 1927.3176
300 -29944.262 8.2516105 -29936.011 334.22637 27118.163 1927.3176
400 -29944.503 8.4884408 -29936.014 343.81903 26527.481 1927.3176
500 -29944.535 8.514281 -29936.021 344.86568 40825.342 1927.3176
600 -29944.479 8.4484458 -29936.031 342.19906 26730.448 1927.3176
700 -29944.57 8.5090059 -29936.061 344.65201 27365.977 1927.3176
800 -29943.903 7.8286542 -29936.074 317.09479 34878.898 1927.3176
900 -29944.711 8.6057383 -29936.106 348.5701 34243.605 1927.3176
1000 -29944.493 8.3574289 -29936.136 338.51248 34715.817 1927.3176
Loop time of 4.74255 on 1 procs for 1000 steps with 192 atoms
Performance: 9.109 ns/day, 2.635 hours/ns, 210.857 timesteps/s
97.5% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 4.576 | 4.576 | 4.576 | 0.0 | 96.49
Neigh | 0.13852 | 0.13852 | 0.13852 | 0.0 | 2.92
Comm | 0.015699 | 0.015699 | 0.015699 | 0.0 | 0.33
Output | 0.0030058 | 0.0030058 | 0.0030058 | 0.0 | 0.06
Modify | 0.0065539 | 0.0065539 | 0.0065539 | 0.0 | 0.14
Other | | 0.002799 | | | 0.06
Nlocal: 192.000 ave 192 max 192 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 2066.00 ave 2066 max 2066 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 0.00000 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs: 40898.0 ave 40898 max 40898 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 40898
Ave neighs/atom = 213.01042
Neighbor list builds = 100
Dangerous builds not checked
Total wall time: 0:00:07
The library type was changed from SHARED to MODULE in deepmodeling#1384. Fixes errors in conda-forge/deepmd-kit-feedstock#31
The library type was changed from SHARED to MODULE in #1384. Fixes errors in conda-forge/deepmd-kit-feedstock#31
In this PR, C++ interface will dynamically load OP libraries, just like Python interface, so it no longer needs linking. Thus, I also remove
CMAKE_LINK_WHAT_YOU_USE
flag. Note that it needs to set RPATH (which we have already done).Refer: https://discuss.tensorflow.org/t/how-to-load-custom-op-from-c/5748