You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking at reproducing aspects of this research as part of an effort to provide reference end-to-end image-text document model pretraining w/ open datasets.
It's great to have the code and models here, but most is focused on fine-tuning. For pretraining a few details are fuzzy for me:
the 'simplified HTML', the process of converting in-the-wild html from websites/CC to pretraining form, does it exactly follow what's described in https://arxiv.org/abs/2107.06955 re their 'Minimal HTML', are there any key differences?
for pretraining warmup, using a book corpus is mentioned, and the target text looks like 100% matchin the rendered input w/o any sort of html? Is there a max char/token length for this, the # patches is fairly small compared to main pretrain
for main pretraining, are the masking spans of text/elements done at word or char level boundaries? are there any rules wrt to spans (span length ranges in word or char counts)
The text was updated successfully, but these errors were encountered:
I'm looking at reproducing aspects of this research as part of an effort to provide reference end-to-end image-text document model pretraining w/ open datasets.
It's great to have the code and models here, but most is focused on fine-tuning. For pretraining a few details are fuzzy for me:
The text was updated successfully, but these errors were encountered: