Deepmd-kit 2.2.11 can not perform finetune #4608

jinfeng-data · 2025-02-22T04:33:03Z

Bug summary

I try to finetune a dpmd model by using dpmd-kit 2.2.11 gpu version, but it can not run successfully, and gives the error message in the following,

2025-02-21 10:36:18.477343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
2025-02-21 10:36:18.548414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
DEEPMD INFO Changing energy bias in pretrained model for types ['O', 'H']... (this step may take long time)
Traceback (most recent call last):
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 74, in main
train_dp(**dict_args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 289, in build
self._init_from_pretrained_model(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1131, in _init_from_pretrained_model
self._change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1139, in _change_energy_bias
self.model.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/model/ener.py", line 509, in change_energy_bias
self.fitting.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/fit/ener.py", line 810, in change_energy_bias
idx_type_map = sorter[
^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

DeePMD-kit Version

2.2.11

Backend and its version

TensorFlow

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

input file:
{
"_comment": " model parameters",
"model": {
"type_map": ["O", "H"],
"type_embedding": {"trainable": true},
"descriptor" :{
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 4.00,
"rcut": 6.00,
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 2,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
    "type":         "exp",
    "decay_steps":  2000,
    "start_lr":     0.001,
    "stop_lr":      3.51e-8,
    "_comment":     "that's all"
},

"loss" :{
    "type":         "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0.9,
    "limit_pref_v": 1.0,
    "_comment":     " that's all"
},

"training" : {
    "training_data": {
        "systems":          ["./train_set/"],
        "set_prefix":   "set",
        "batch_size":       1,
        "_comment":         "that's all"
    },
    "validation_data":{
        "systems":          ["./test_set/"],
        "batch_size":       1,
        "numb_btch":        3,
        "_comment":         "that's all"
    },
    "numb_steps":   200000,
    "seed":         3,
    "disp_file":    "lcurve.out",
    "disp_freq":    100,
    "save_freq":    1000,
    "_comment":     "that's all"
},

"_comment":         "that's all"

}

Running commands: dp train dp2.0_finetune_input.json --finetune graph.0.pb

Steps to Reproduce

graph.0.pb.txt

Further Information, Files, and Links

No response

The text was updated successfully, but these errors were encountered:

jinfeng-data · 2025-02-22T04:33:54Z

dataset.rar.txt

Above is the dataset.

jinfeng-data added the bug label Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepmd-kit 2.2.11 can not perform finetune #4608

Deepmd-kit 2.2.11 can not perform finetune #4608

jinfeng-data commented Feb 22, 2025

jinfeng-data commented Feb 22, 2025

Deepmd-kit 2.2.11 can not perform finetune #4608

Deepmd-kit 2.2.11 can not perform finetune #4608

Comments

jinfeng-data commented Feb 22, 2025

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

jinfeng-data commented Feb 22, 2025