You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using DiT, and trying to finetune for layout analysis (object detection) on a dataset other than PubLayNet (end goal is to fine tune it to go beyond its current classification capabilities).
The problem arises when using:
the official example scripts: (give details below)
my own modified scripts: (give details below)
If I try to resume training, whether I use the config in object_detection or the generated one in the output directory, I get warnings like the following:
WARNING [08/26 17:26:55 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
backbone.fpn_lateral2.{bias, weight}
backbone.fpn_lateral3.{bias, weight}
backbone.fpn_lateral4.{bias, weight}
backbone.fpn_lateral5.{bias, weight}
backbone.fpn_output2.{bias, weight}
backbone.fpn_output3.{bias, weight}
backbone.fpn_output4.{bias, weight}
backbone.fpn_output5.{bias, weight}
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_head.fc1.{bias, weight}
roi_heads.box_head.fc2.{bias, weight}
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.{bias, weight}
roi_heads.mask_head.mask_fcn2.{bias, weight}
roi_heads.mask_head.mask_fcn3.{bias, weight}
roi_heads.mask_head.mask_fcn4.{bias, weight}
roi_heads.mask_head.predictor.{bias, weight}
WARNING [08/26 17:26:55 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
backbone.bottom_up.backbone.backbone.fpn_lateral2.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_output2.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_lateral3.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_output3.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_lateral4.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_output4.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_lateral5.{bias, weight}
backbone.bottom_up.backbone.backbone.fpn_output5.{bias, weight}
backbone.bottom_up.backbone.proposal_generator.rpn_head.conv.{bias, weight}
backbone.bottom_up.backbone.proposal_generator.rpn_head.objectness_logits.{bias, weight}
backbone.bottom_up.backbone.proposal_generator.rpn_head.anchor_deltas.{bias, weight}
backbone.bottom_up.backbone.roi_heads.box_head.fc1.{bias, weight}
backbone.bottom_up.backbone.roi_heads.box_head.fc2.{bias, weight}
backbone.bottom_up.backbone.roi_heads.box_predictor.cls_score.{bias, weight}
backbone.bottom_up.backbone.roi_heads.box_predictor.bbox_pred.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.mask_fcn1.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.mask_fcn2.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.mask_fcn3.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.mask_fcn4.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.deconv.{bias, weight}
backbone.bottom_up.backbone.roi_heads.mask_head.predictor.{bias, weight}
and, when training starts, the loss starts very high (random, between 2 and 5) instead of being close to 0.5 which is where I had left it.
I need to be able to resume because of the policies of the university cluster I am using which don't allow me to train for long sessions.
Platform: Ubuntu 20.04.6, CUDA 11.4
Python version: 3.9.17
PyTorch version (GPU?): 1.9.1+cu111
The text was updated successfully, but these errors were encountered:
I am using DiT, and trying to finetune for layout analysis (object detection) on a dataset other than PubLayNet (end goal is to fine tune it to go beyond its current classification capabilities).
The problem arises when using:
If I try to resume training, whether I use the config in
object_detection
or the generated one in the output directory, I get warnings like the following:and, when training starts, the loss starts very high (random, between 2 and 5) instead of being close to 0.5 which is where I had left it.
I need to be able to resume because of the policies of the university cluster I am using which don't allow me to train for long sessions.
The text was updated successfully, but these errors were encountered: