Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepmd-kit 2.2.11 can not perform finetune #4608

Open
jinfeng-data opened this issue Feb 22, 2025 · 1 comment
Open

Deepmd-kit 2.2.11 can not perform finetune #4608

jinfeng-data opened this issue Feb 22, 2025 · 1 comment
Labels

Comments

@jinfeng-data
Copy link

Bug summary

I try to finetune a dpmd model by using dpmd-kit 2.2.11 gpu version, but it can not run successfully, and gives the error message in the following,

2025-02-21 10:36:18.477343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
2025-02-21 10:36:18.548414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
DEEPMD INFO Changing energy bias in pretrained model for types ['O', 'H']... (this step may take long time)
Traceback (most recent call last):
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 74, in main
train_dp(**dict_args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 289, in build
self._init_from_pretrained_model(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1131, in _init_from_pretrained_model
self._change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1139, in _change_energy_bias
self.model.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/model/ener.py", line 509, in change_energy_bias
self.fitting.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/fit/ener.py", line 810, in change_energy_bias
idx_type_map = sorter[
^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

DeePMD-kit Version

2.2.11

Backend and its version

TensorFlow

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

input file:
{
"_comment": " model parameters",
"model": {
"type_map": ["O", "H"],
"type_embedding": {"trainable": true},
"descriptor" :{
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 4.00,
"rcut": 6.00,
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 2,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
    "type":         "exp",
    "decay_steps":  2000,
    "start_lr":     0.001,
    "stop_lr":      3.51e-8,
    "_comment":     "that's all"
},

"loss" :{
    "type":         "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0.9,
    "limit_pref_v": 1.0,
    "_comment":     " that's all"
},

"training" : {
    "training_data": {
        "systems":          ["./train_set/"],
        "set_prefix":   "set",
        "batch_size":       1,
        "_comment":         "that's all"
    },
    "validation_data":{
        "systems":          ["./test_set/"],
        "batch_size":       1,
        "numb_btch":        3,
        "_comment":         "that's all"
    },
    "numb_steps":   200000,
    "seed":         3,
    "disp_file":    "lcurve.out",
    "disp_freq":    100,
    "save_freq":    1000,
    "_comment":     "that's all"
},

"_comment":         "that's all"

}

Running commands: dp train dp2.0_finetune_input.json --finetune graph.0.pb

Steps to Reproduce

graph.0.pb.txt

Further Information, Files, and Links

No response

@jinfeng-data
Copy link
Author

dataset.rar.txt

Above is the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant