Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backpropgation on chunks? #14

Open
vr25 opened this issue Oct 2, 2020 · 2 comments
Open

backpropgation on chunks? #14

vr25 opened this issue Oct 2, 2020 · 2 comments

Comments

@vr25
Copy link

vr25 commented Oct 2, 2020

Hi,

When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?

Also, do you unfreeze and fine-tune for the classification task?

Thank you!

@AndriyMulyar
Copy link
Owner

AndriyMulyar commented Oct 5, 2020 via email

@vr25
Copy link
Author

vr25 commented Oct 5, 2020

More explanation on how loss is calculated for every chunk separately? I mean the entire document has a target label and so AFAIU, the loss would be calculated for this target, right? Please let me know if I am missing something.

Also, what is the maximum number of chunks in the entire dataset?

The default config has bert_batch_size=7 but I have some documents with a total number of chunks=125 per document. In such cases, if I set bert_batch_size to 125, I run into CUDA OOM error.

Any suggestions for this?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants