Cross-modal Retrieval Objective (CMR) #9

mojivalipour · 2021-11-09T10:22:52Z

Can you point me to the place in your code where CMR is implemented? You used CMR + VMLM + SMRM for the pre-training, according to the paper. However, CMR is not part of your supported tasks. Am I missing something?

intersun · 2021-11-09T10:25:36Z

it is used by default so you don't have to specify it. The loss is implemented at https://github.com/intersun/LightningDOT/blob/5f2880f69ba87b8701ab89348d70ebb11432578c/dvl/utils.py#L114.

mojivalipour · 2021-11-09T10:42:32Z

Now I'm quite confused with how this repository's files are structured. I thought pretrain.py is the file to pre-train your lightningDOT model. However, it does not appear that the _calc_loss function has been called anywhere in the code. However, train_itm.py uses this function several times. Therefore, can you please provide me with specific instructions on how to reproduce lightningDOT paper results?

mojivalipour · 2021-11-09T10:48:36Z

Is that the case that pretrain.py is only provided to pre-train the UNITER model? If not then what's the usage of train_itm.py?

intersun · 2021-11-09T10:52:51Z

I totally agree it is indeed confusing since we didn't have time to clean the code. As you may noticed we left lots of other stuff that are not mentioned in the paper such as hard negatives, knowledge distillation and etc.

To answer your question (for pre-training only, I assume you already figured out how to use the loss for fine-tuning), if you trace the definition of pre-trained model (

LightningDOT/pretrain.py

Line 313 in 5f2880f

    
           model = BiEncoderForPretraining(args.model_config, args, args.project_dim, IMG_DIM, IMG_LABEL_DIM,

), you will notice the relevant forward function is defined at

LightningDOT/dvl/models/bi_encoder.py

Line 484 in 5f2880f

def forward_itm(self, batch, targets, ot_inputs,

.

To answer your second question, pretrain.py is solely for pre-training and train_itm is solely for fine-tuning. Currently I don't have time to merge them, and that is definitely confusing....

mojivalipour · 2021-11-09T11:04:03Z

I see, thank you. So, essentially your itm implementation is different from the original implementation of itm in UNITER. And yours is based on the CMR in the paper. In fact, itm_loss1 is the same image-retrieval loss, and itm_loss2 is the same text retrieval-loss in the article. Just one more question, what is that ot_loss?

intersun · 2021-11-09T11:09:58Z

correct... since I implemented fine-tuning first, and later found out changing it into pre-training is not trivial, I ended up implementing pre-training and fine-tuning separately...

OT loss refers to loss proposed in http://proceedings.mlr.press/v119/chen20e.html. I never tried it though so not sure how it will work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-modal Retrieval Objective (CMR) #9

Cross-modal Retrieval Objective (CMR) #9

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021

Cross-modal Retrieval Objective (CMR) #9

Cross-modal Retrieval Objective (CMR) #9

Comments

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021

mojivalipour commented Nov 9, 2021

intersun commented Nov 9, 2021