-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] pd: add CPP inference with LAMMPS #4467
base: devel
Are you sure you want to change the base?
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request introduce support for the Paddle framework within the DeePMD project. A new CMake option Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (8)
source/api_cc/include/DeepPotPD.h (3)
238-239
: Typo in documentation: Correct "arraay" to "array"There is a typographical error in the documentation comment. The word "arraay" should be corrected to "array" for clarity.
Apply this diff to fix the typo:
- * @brief Get the buffer arraay of this model. + * @brief Get the buffer array of this model.
251-251
: Typo in documentation: Missing period at the end of the sentenceThe documentation comment is missing a period at the end of the sentence. For consistency and readability, please add a period.
Apply this diff:
- * @param[out] buffer_scalar Buffer scalar. + * @param[out] buffer_scalar Buffer scalar. + */
342-342
: Variable naming: Renameinited
toinitialized
The private member variable
inited
is a non-standard abbreviation. Renaming it toinitialized
enhances readability and adheres to common naming conventions.Apply this diff to rename the variable:
-int inited; +bool initialized;Also, update all occurrences of
inited
throughout the class toinitialized
.source/api_cc/src/DeepPotPD.cc (2)
323-324
: Incorrect exception message: Replace "fparam" with "aparam"In the exception messages, "fparam is not supported as input yet." should be "aparam is not supported as input yet." when handling
aparam
. This correction avoids confusion and provides accurate information.Apply these diffs:
throw deepmd::deepmd_exception("fparam is not supported as input yet."); +throw deepmd::deepmd_exception("aparam is not supported as input yet.");
Repeat this correction in both occurrences where
aparam
is involved.Also applies to: 330-331
47-52
: Hardcoded GPU settings: Make GPU memory and ID configurableThe GPU memory size and device ID are hardcoded, which may not be suitable for all environments. Consider making these parameters configurable to enhance flexibility and usability.
Suggest modifying the code to accept GPU settings from configuration or input parameters.
-int gpu_num = 1; -if (gpu_num > 0) { - gpu_id = gpu_rank % gpu_num; -} else { - gpu_id = 0; -} +gpu_id = gpu_rank;Similarly, allow the GPU memory size to be specified:
-config->EnableUseGpu( - 4096, 0); // annotate it if use cpu, default use gpu with 4G mem +config->EnableUseGpu(memory_size_in_mb, gpu_id);Ensure that
memory_size_in_mb
andgpu_id
are properly defined and passed to theinit
method.source/api_cc/include/version.h.in (1)
13-13
: Update comments to reflect new variableA new variable
global_pd_lib
has been added. Consider updating any documentation or comments to include this variable, ensuring that all global library variables are documented consistently.Add a comment describing
global_pd_lib
, similar to the existing variables.+// Path to the Paddle libraries const std::string global_pd_lib="@PADDLE_LIBRARIES@";
source/api_cc/CMakeLists.txt (1)
30-37
: Consider making the ROCm library path configurable.While the Paddle integration looks good, the hardcoded path to
libgalaxyhip.so
might cause issues across different systems. Consider making this path configurable through a CMake variable.if(ENABLE_PADDLE AND NOT BUILD_PY_IF) target_link_libraries(${libname} PUBLIC "${PADDLE_LIBRARIES}") target_compile_definitions(${libname} PUBLIC BUILD_PADDLE) if(DP_VARIANT STREQUAL "rocm") + set(GALAXYHIP_LIBRARY "${hip_LIB_INSTALL_DIR}/libgalaxyhip.so" CACHE PATH "Path to libgalaxyhip.so") target_link_libraries(${libname} - PUBLIC "${hip_LIB_INSTALL_DIR}/libgalaxyhip.so") + PUBLIC "${GALAXYHIP_LIBRARY}") endif() endif()source/api_cc/src/DeepPot.cc (1)
59-63
: Improve error message for better user guidance.While the initialization logic is correct, consider making the error message more informative by including build instructions.
- throw deepmd::deepmd_exception("PaddlePaddle backend is not supported yet"); + throw deepmd::deepmd_exception("PaddlePaddle backend is not built. Please rebuild with -DENABLE_PADDLE=ON");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
source/CMakeLists.txt
(3 hunks)source/api_cc/CMakeLists.txt
(1 hunks)source/api_cc/include/DeepPotPD.h
(1 hunks)source/api_cc/include/version.h.in
(1 hunks)source/api_cc/src/DeepPot.cc
(2 hunks)source/api_cc/src/DeepPotPD.cc
(1 hunks)source/api_cc/src/common.cc
(2 hunks)source/config/CMakeLists.txt
(1 hunks)source/config/run_config.ini
(1 hunks)
🔇 Additional comments (8)
source/api_cc/src/DeepPotPD.cc (1)
185-186
:
Ownership mismatch: Potential issue with unique_ptr
assignment
Assigning the result of GetInputHandle
to a std::unique_ptr
may cause ownership issues if GetInputHandle
does not return a unique_ptr
. Verify the return type of GetInputHandle
and ensure proper ownership semantics.
Please check if GetInputHandle
returns a std::unique_ptr
. If it returns a raw pointer or shared pointer, adjust the declaration of firstneigh_tensor
accordingly.
-std::unique_ptr<paddle_infer::Tensor> firstneigh_tensor;
+auto firstneigh_tensor = predictor->GetInputHandle("nlist");
Or, if a shared_ptr
is appropriate:
-std::unique_ptr<paddle_infer::Tensor> firstneigh_tensor;
+std::shared_ptr<paddle_infer::Tensor> firstneigh_tensor;
source/config/CMakeLists.txt (1)
17-21
: Consistency in handling boolean options
The addition of the ENABLE_PADDLE
option follows the existing pattern, but ensure that all boolean variables are consistently handled throughout the configuration. Verify that ENABLE_PADDLE
is properly set and used in other CMake scripts.
source/config/run_config.ini (1)
17-18
: LGTM! Configuration parameters for Paddle integration.
The new configuration parameters follow the established naming convention and structure.
source/CMakeLists.txt (2)
12-12
: LGTM! Option declaration follows CMake conventions.
The ENABLE_PADDLE option is properly declared with a safe default value of OFF.
335-337
: LGTM! Backend messaging is properly updated.
The backend messaging section is correctly updated to include Paddle in the list of enabled backends.
Also applies to: 341-341
source/api_cc/src/DeepPot.cc (1)
18-20
: LGTM! Header inclusion follows existing backend patterns.
The Paddle backend header inclusion is properly guarded with BUILD_PADDLE and follows the same pattern as other backend includes.
source/api_cc/src/common.cc (2)
1395-1397
: LGTM! Summary printing follows existing patterns.
The addition of Paddle library path to the summary output is consistent with how other backends are handled.
1415-1417
:
Consider using a more specific file extension for Paddle models.
Using .json
as the model file extension could lead to conflicts since it's a very common format used for various purposes. Consider using a more specific extension like .pdmodel
or .pd.json
.
/** | ||
* @brief DP constructor without initialization. | ||
**/ | ||
DeepPotPD(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Inefficient passing of small types by const reference
Passing small built-in types like int
and bool
by const reference is less efficient than passing them by value. It is recommended to pass these types by value to improve performance and code clarity.
Apply the following diffs to pass int
and bool
parameters by value:
-DeepPotPD();
+DeepPotPD();
-DeepPotPD(const std::string& model,
- const int& gpu_rank = 0,
- const std::string& file_content = "");
+DeepPotPD(const std::string& model,
+ int gpu_rank = 0,
+ const std::string& file_content = "");
-void init(const std::string& model,
- const int& gpu_rank = 0,
- const std::string& file_content = "");
+void init(const std::string& model,
+ int gpu_rank = 0,
+ const std::string& file_content = "");
-template <typename VALUETYPE, typename ENERGYVTYPE>
-void compute(ENERGYVTYPE& ener,
+template <typename VALUETYPE, typename ENERGYVTYPE>
+void compute(ENERGYVTYPE& ener,
...
Repeat this change for all instances where const int&
or const bool&
is used for parameter passing.
Also applies to: 26-28, 36-38, 145-145, 191-191, 198-198
force.resize(static_cast<size_t>(nframes) * fwd_map.size() * 3); | ||
select_map<VALUETYPE>(force, dforce, bkw_map, 3, nframes, fwd_map.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential exception safety issue: Throwing string literals
Throwing string literals is not recommended and may lead to undefined behavior. Instead, throw exceptions derived from std::exception
or use existing exception classes.
Apply this diff to throw a proper exception:
-throw "atomic virial is not supported as output yet.";
+throw deepmd::deepmd_exception("Atomic virial is not supported as output yet.");
This change ensures proper exception handling and consistency with other exception uses in the code.
Committable suggestion skipped: line range outside the PR's diff.
if(ENABLE_PADDLE) | ||
if(NOT DEFINED PADDLE_INFERENCE_DIR) | ||
# message( FATAL_ERROR "Make sure PADDLE_INFERENCE_DIR is set when | ||
# ENABLE_PADDLE=ON") | ||
message(STATUS "PADDLE_INFERENCE_DIR is not defined. Downloading...") | ||
set(DOWNLOAD_URL | ||
"https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/ce51e82e84fc97e0a55a162037f1554746159cad/paddle_inference.tgz" | ||
) | ||
set(TGZ_FILE "${CMAKE_BINARY_DIR}/paddle_inference.tgz") | ||
set(EXTRACTED_DIR "${CMAKE_BINARY_DIR}/paddle_inference_install_dir") | ||
file(DOWNLOAD ${DOWNLOAD_URL} ${TGZ_FILE}) | ||
message(STATUS "Downloading finished, extracting...") | ||
execute_process(COMMAND ${CMAKE_COMMAND} -E tar -xzvf ${TGZ_FILE} | ||
OUTPUT_QUIET) | ||
file(REMOVE ${TGZ_FILE}) | ||
set(PADDLE_INFERENCE_DIR | ||
${EXTRACTED_DIR} | ||
CACHE PATH | ||
"Path to 'paddle_inference_install_dir' or 'paddle_inference'") | ||
else() | ||
message( | ||
STATUS "PADDLE_INFERENCE_DIR is already defined: ${PADDLE_INFERENCE_DIR}") | ||
endif() | ||
|
||
message(STATUS "Final PADDLE_INFERENCE_DIR is set to ${PADDLE_INFERENCE_DIR}") | ||
|
||
set(PADDLE_INFERENCE_DIR | ||
${PADDLE_INFERENCE_DIR} | ||
CACHE PATH "Path to 'paddle_inference_install_dir' or 'paddle_inference'") | ||
|
||
# used in api_cc | ||
set(PADDLE_LIBRARIES | ||
"${PADDLE_INFERENCE_DIR}/paddle/lib/libpaddle_inference.so" | ||
CACHE PATH "Path to libpaddle_inference.so") | ||
|
||
include_directories("${PADDLE_INFERENCE_DIR}/") | ||
set(PADDLE_LIB_THIRD_PARTY_PATH | ||
"${PADDLE_INFERENCE_DIR}/third_party/install/") | ||
|
||
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include") | ||
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include") | ||
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include") | ||
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include") | ||
|
||
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib") | ||
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib") | ||
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib") | ||
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib") | ||
link_directories("${PADDLE_INFERENCE_DIR}/paddle/lib") | ||
# if (USE_ROCM_TOOLKIT) add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1) endif() | ||
endif(ENABLE_PADDLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve robustness of Paddle library download and configuration.
Several improvements could make the Paddle integration more robust:
- The download URL is hardcoded and might become outdated
- No verification of downloaded file integrity
- No fallback mechanism if download fails
- Third-party library paths are hardcoded
Consider implementing these improvements:
if(ENABLE_PADDLE)
if(NOT DEFINED PADDLE_INFERENCE_DIR)
- message(STATUS "PADDLE_INFERENCE_DIR is not defined. Downloading...")
- set(DOWNLOAD_URL
- "https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/ce51e82e84fc97e0a55a162037f1554746159cad/paddle_inference.tgz"
- )
+ set(PADDLE_DOWNLOAD_URL "https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/ce51e82e84fc97e0a55a162037f1554746159cad/paddle_inference.tgz"
+ CACHE STRING "URL to download Paddle inference library")
+ set(PADDLE_DOWNLOAD_HASH "expected_hash_here"
+ CACHE STRING "Expected hash of downloaded Paddle inference library")
+
+ message(STATUS "PADDLE_INFERENCE_DIR is not defined. Downloading from ${PADDLE_DOWNLOAD_URL}...")
set(TGZ_FILE "${CMAKE_BINARY_DIR}/paddle_inference.tgz")
set(EXTRACTED_DIR "${CMAKE_BINARY_DIR}/paddle_inference_install_dir")
- file(DOWNLOAD ${DOWNLOAD_URL} ${TGZ_FILE})
+
+ file(DOWNLOAD ${PADDLE_DOWNLOAD_URL} ${TGZ_FILE}
+ EXPECTED_HASH SHA256=${PADDLE_DOWNLOAD_HASH}
+ STATUS DOWNLOAD_STATUS
+ SHOW_PROGRESS)
+
+ list(GET DOWNLOAD_STATUS 0 STATUS_CODE)
+ if(NOT STATUS_CODE EQUAL 0)
+ message(FATAL_ERROR "Failed to download Paddle inference library. Please set PADDLE_INFERENCE_DIR manually.")
+ endif()
Also, consider making third-party library paths configurable:
- set(PADDLE_LIB_THIRD_PARTY_PATH
- "${PADDLE_INFERENCE_DIR}/third_party/install/")
+ set(PADDLE_THIRD_PARTY_PATH "${PADDLE_INFERENCE_DIR}/third_party/install/"
+ CACHE PATH "Path to Paddle third-party libraries")
Committable suggestion skipped: line range outside the PR's diff.
source/api_cc/src/common.cc
Outdated
@@ -1409,6 +1412,9 @@ deepmd::DPBackend deepmd::get_backend(const std::string& model) { | |||
} else if (model.length() >= 11 && | |||
model.substr(model.length() - 11) == ".savedmodel") { | |||
return deepmd::DPBackend::JAX; | |||
} else if ((model.length() >= 5 && | |||
model.substr(model.length() - 5) == ".json")) { | |||
backend = deepmd::DPBackend::Paddle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undeclared variable usage.
The backend
variable is used without being declared first. This will cause a compilation error.
- backend = deepmd::DPBackend::Paddle;
+ return deepmd::DPBackend::Paddle;
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
source/api_cc/include/DeepPotPD.h (2)
259-262
: Inconsistent data type: declareaparam_nall
asbool
The member variable
aparam_nall
is declared as anint
but is used as a boolean in theis_aparam_nall()
method. To improve code clarity and correctness, consider changing its type tobool
.Apply the following change:
- int aparam_nall; + bool aparam_nall;Ensure that all assignments and usages of
aparam_nall
are updated accordingly.Also applies to: 347-347
17-18
: Delete copy constructor and assignment operator to prevent unintended copyingThe
DeepPotPD
class manages resources usingstd::shared_ptr
. To prevent unintended copying, which could lead to resource management issues, consider deleting the copy constructor and copy assignment operator.Add the following declarations to the class:
DeepPotPD(const DeepPotPD&) = delete; DeepPotPD& operator=(const DeepPotPD&) = delete;source/lmp/plugin/CMakeLists.txt (1)
11-11
: Add space in the message for clarityThere is no space between "TO" and
${LAMMPS_SOURCE_ROOT}
in the message, causing the output to be concatenated without a space. Please add a space for improved readability.Apply the following change:
-message(STATUS "STARTING DOWNLOAD LAMMPS TO" ${LAMMPS_SOURCE_ROOT}) +message(STATUS "STARTING DOWNLOAD LAMMPS TO " ${LAMMPS_SOURCE_ROOT})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
source/api_cc/include/DeepPotPD.h
(1 hunks)source/api_cc/src/common.cc
(2 hunks)source/lmp/plugin/CMakeLists.txt
(1 hunks)
🔇 Additional comments (3)
source/api_cc/include/DeepPotPD.h (1)
27-27
: Avoid passing small built-in types by const reference
Passing small built-in types like int
by const reference (const int&
) is less efficient than passing them by value. It is recommended to pass them by value to improve performance and code clarity.
Apply the following changes to update the parameter declarations:
At line 27:
- const int& gpu_rank = 0,
+ int gpu_rank = 0,
At line 38:
- const int& gpu_rank = 0,
+ int gpu_rank = 0,
At line 114:
- const int& ago,
+ int ago,
At line 145:
- const int& nframes,
+ int nframes,
At line 183:
- const int& nframes,
+ int nframes,
Also applies to: 38-38, 114-114, 145-145, 183-183
source/api_cc/src/common.cc (2)
1395-1397
: Paddle library integration is correctly implemented
The addition of the Paddle library path in the print_summary
function is properly included and follows the existing pattern for other backends.
1415-1417
: Backend detection for Paddle models is appropriate
The modification to the get_backend
function to detect .json
files for the Paddle backend is correctly implemented and ensures proper backend selection.
throw deepmd::deepmd_exception("fparam is not supported as input yet."); | ||
// fparam_tensor = predictor->GetInputHandle("fparam"); | ||
// fparam_tensor->Reshape({1, static_cast<int>(fparam.size())}); |
Check notice
Code scanning / CodeQL
Commented-out code Note
throw deepmd::deepmd_exception("aparam is not supported as input yet."); | ||
// aparam_tensor = predictor->GetInputHandle("aparam"); | ||
// aparam_tensor->Reshape({1, lmp_list.inum, | ||
// static_cast<int>(aparam_.size()) / lmp_list.inum}); |
Check notice
Code scanning / CodeQL
Commented-out code Note
throw "atomic virial is not supported as output yet."; | ||
// auto atom_virial_ = predictor->GetOutputHandle("extended_virial"); | ||
// auto atom_energy_ = predictor->GetOutputHandle("atom_energy"); | ||
// datom_energy.resize(nall_real, | ||
// 0.0); // resize to nall to be consistenet with TF. | ||
// atom_energy_->CopyToCpu(datom_energy.data()); | ||
// atom_virial_->CopyToCpu(datom_virial.data()); | ||
// atom_energy.resize(static_cast<size_t>(nframes) * fwd_map.size()); | ||
// atom_virial.resize(static_cast<size_t>(nframes) * fwd_map.size() * 9); | ||
// select_map<VALUETYPE>(atom_energy, datom_energy, bkw_map, 1, nframes, | ||
// fwd_map.size(), nall_real); | ||
// select_map<VALUETYPE>(atom_virial, datom_virial, bkw_map, 9, nframes, |
Check notice
Code scanning / CodeQL
Commented-out code Note
throw deepmd::deepmd_exception("fparam is not supported as input yet."); | ||
// fparam_tensor = predictor->GetInputHandle("box"); | ||
// fparam_tensor->Reshape({1, static_cast<int>(fparam.size())}); |
Check notice
Code scanning / CodeQL
Commented-out code Note
throw deepmd::deepmd_exception("fparam is not supported as input yet."); | ||
// aparam_tensor = predictor->GetInputHandle("box"); | ||
// aparam_tensor->Reshape({1, natoms, static_cast<int>(aparam.size()) / |
Check notice
Code scanning / CodeQL
Commented-out code Note
"atomic virial is not supported as output yet."); | ||
// auto atom_energy_ = predictor->GetOutputHandle(output_names[4]); | ||
// auto atom_virial_ = predictor->GetOutputHandle(output_names[5]); | ||
// atom_energy_->CopyToCpu(atom_energy.data()); |
Check notice
Code scanning / CodeQL
Commented-out code Note
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4467 +/- ##
==========================================
- Coverage 83.75% 83.75% -0.01%
==========================================
Files 667 667
Lines 61515 61517 +2
Branches 3486 3487 +1
==========================================
Hits 51522 51522
- Misses 8867 8869 +2
Partials 1126 1126 ☔ View full report in Codecov by Sentry. |
support LAMMPS inference with Paddle backend on cuda and dcu device
Summary by CodeRabbit
New Features
DeepPotPD
for Paddle-based potential energy calculations.Bug Fixes
Documentation
Chores