Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-modal Retrieval Objective (CMR) #9

Open
mojivalipour opened this issue Nov 9, 2021 · 6 comments
Open

Cross-modal Retrieval Objective (CMR) #9

mojivalipour opened this issue Nov 9, 2021 · 6 comments

Comments

@mojivalipour
Copy link

Can you point me to the place in your code where CMR is implemented? You used CMR + VMLM + SMRM for the pre-training, according to the paper. However, CMR is not part of your supported tasks. Am I missing something?

@intersun
Copy link
Owner

intersun commented Nov 9, 2021

it is used by default so you don't have to specify it. The loss is implemented at https://github.com/intersun/LightningDOT/blob/5f2880f69ba87b8701ab89348d70ebb11432578c/dvl/utils.py#L114.

@mojivalipour
Copy link
Author

Now I'm quite confused with how this repository's files are structured. I thought pretrain.py is the file to pre-train your lightningDOT model. However, it does not appear that the _calc_loss function has been called anywhere in the code. However, train_itm.py uses this function several times. Therefore, can you please provide me with specific instructions on how to reproduce lightningDOT paper results?

@mojivalipour
Copy link
Author

Is that the case that pretrain.py is only provided to pre-train the UNITER model? If not then what's the usage of train_itm.py?

@intersun
Copy link
Owner

intersun commented Nov 9, 2021

I totally agree it is indeed confusing since we didn't have time to clean the code. As you may noticed we left lots of other stuff that are not mentioned in the paper such as hard negatives, knowledge distillation and etc.

To answer your question (for pre-training only, I assume you already figured out how to use the loss for fine-tuning), if you trace the definition of pre-trained model (

model = BiEncoderForPretraining(args.model_config, args, args.project_dim, IMG_DIM, IMG_LABEL_DIM,
), you will notice the relevant forward function is defined at
def forward_itm(self, batch, targets, ot_inputs,
.

To answer your second question, pretrain.py is solely for pre-training and train_itm is solely for fine-tuning. Currently I don't have time to merge them, and that is definitely confusing....

@mojivalipour
Copy link
Author

I see, thank you. So, essentially your itm implementation is different from the original implementation of itm in UNITER. And yours is based on the CMR in the paper. In fact, itm_loss1 is the same image-retrieval loss, and itm_loss2 is the same text retrieval-loss in the article. Just one more question, what is that ot_loss?

@intersun
Copy link
Owner

intersun commented Nov 9, 2021

correct... since I implemented fine-tuning first, and later found out changing it into pre-training is not trivial, I ended up implementing pre-training and fine-tuning separately...

OT loss refers to loss proposed in http://proceedings.mlr.press/v119/chen20e.html. I never tried it though so not sure how it will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants