Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing tokenizer checkpoints of w/o VF loss and with VF loss (MAE) #8

Open
xingjianleng opened this issue Jan 8, 2025 · 5 comments

Comments

@xingjianleng
Copy link

Hi authors,

Thank you for the interesting work.

There are three variants of tokenizer checkpoints mentioned in Section 5.1 of the paper. But for now, I can only find the checkpoint of VF loss (DINOv2) in this repository. I'm wondering whether the authors have some plan to release the other two shortly.

Thanks!

@JingfengYao
Copy link
Member

Thanks for your interest in our work.

In fact, in Section 5, we trained 9 variants of VAEs with different latent dimensions and VF losses (see Table 2). The checkpoints have not been released as they were primarily exploratory in nature and underwent limited training epochs. The VA-VAE we released was ultimately trained for a longer period to ensure its final performance, and it is the one used in Table 3. We may consider releasing these experimental checkpoints in the future.

@xingjianleng
Copy link
Author

Thank you for your response.

I have further questions about the hyperparameters used to train the released VAE. As you mentioned, different variants of the VAE serve as the purpose for ablation studies.

So, is this sentence "To accelerate convergence, we adjust the learning rate and global batch size to 1e-4 and 256, respectively. In contrast to previous settings, each tokenizer is trained on ImageNet 256 × 256 for 50 epochs." from the paper referring to only the hyperparameters for the ablations studies?

If so, is it possible to disclose the hyperparameters to train the released VAE, e.g., training epochs, learning rate, learning rate scale, batch size...

@JingfengYao
Copy link
Member

Thank you for your reminder.

Indeed, we have mentioned this issue in Section 5.4. We employed a progressive strategy. Specifically, we used a fixed batch size of 256 and a learning rate of 1e-4. In the early stages of training, we used a larger w_hyper=0.5 and did not apply the margin to losses. Subsequently, at the 100th and 115th epochs, we reduced w_hyper to 0.1 and activated the margin strategy, respectively.

We will include a more detailed description of this part in the next version of the paper.

@txytju
Copy link

txytju commented Jan 12, 2025

Thanks for your interest in our work.

In fact, in Section 5, we trained 9 variants of VAEs with different latent dimensions and VF losses (see Table 2). The checkpoints have not been released as they were primarily exploratory in nature and underwent limited training epochs. The VA-VAE we released was ultimately trained for a longer period to ensure its final performance, and it is the one used in Table 3. We may consider releasing these experimental checkpoints in the future.

Look forward to it

@JingfengYao
Copy link
Member

Hi, thanks for your attention.

We have released more va-vae experimental variants here. Hope you like it. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants