You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to finetune a dpmd model by using dpmd-kit 2.2.11 gpu version, but it can not run successfully, and gives the error message in the following,
2025-02-21 10:36:18.477343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
2025-02-21 10:36:18.548414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
DEEPMD INFO Changing energy bias in pretrained model for types ['O', 'H']... (this step may take long time)
Traceback (most recent call last):
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 74, in main
train_dp(**dict_args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 289, in build
self._init_from_pretrained_model(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1131, in _init_from_pretrained_model
self._change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1139, in _change_energy_bias
self.model.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/model/ener.py", line 509, in change_energy_bias
self.fitting.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/fit/ener.py", line 810, in change_energy_bias
idx_type_map = sorter[
^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
Bug summary
I try to finetune a dpmd model by using dpmd-kit 2.2.11 gpu version, but it can not run successfully, and gives the error message in the following,
2025-02-21 10:36:18.477343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
2025-02-21 10:36:18.548414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38380 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:86:00.0, compute capability: 8.0
DEEPMD INFO Changing energy bias in pretrained model for types ['O', 'H']... (this step may take long time)
Traceback (most recent call last):
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 74, in main
train_dp(**dict_args)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 289, in build
self._init_from_pretrained_model(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1131, in _init_from_pretrained_model
self._change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/train/trainer.py", line 1139, in _change_energy_bias
self.model.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/model/ener.py", line 509, in change_energy_bias
self.fitting.change_energy_bias(
File "/public/home/xiaohe/jinfeng/soft/deepmd-kit-2.2.11-gpu/lib/python3.11/site-packages/deepmd/fit/ener.py", line 810, in change_energy_bias
idx_type_map = sorter[
^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
DeePMD-kit Version
2.2.11
Backend and its version
TensorFlow
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
input file:
{
"_comment": " model parameters",
"model": {
"type_map": ["O", "H"],
"type_embedding": {"trainable": true},
"descriptor" :{
"type": "se_atten_v2",
"sel": 120,
"rcut_smth": 4.00,
"rcut": 6.00,
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 2,
"_comment": " that's all"
},
"_comment": " that's all"
},
}
Running commands: dp train dp2.0_finetune_input.json --finetune graph.0.pb
Steps to Reproduce
graph.0.pb.txt
Further Information, Files, and Links
No response
The text was updated successfully, but these errors were encountered: