Why MaskedTransformer achieve better result than vqvae? #81

FufenNan · 2024-10-14T10:07:40Z

Thanks for sharing your excellent work.
I'm trying to replicate the results but encountered some problems. According to your code, the MaskedTransformer predicts the base tokens encoded by VQVAE, and it has no relation to the residual. My VQVAE without the residual layer only achieves an FID score of 0.2, while my MaskedTransformer achieves 0.09. This is really confusing for me since the MaskedTransformer learns from VQVAE but shows better performance.
Could you explain why this happens?

Murrol · 2024-10-29T03:01:15Z

This is interesting. It might be because the encoder of the vqvae is not converged properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why MaskedTransformer achieve better result than vqvae? #81

Why MaskedTransformer achieve better result than vqvae? #81

FufenNan commented Oct 14, 2024

Murrol commented Oct 29, 2024

Why MaskedTransformer achieve better result than vqvae? #81

Why MaskedTransformer achieve better result than vqvae? #81

Comments

FufenNan commented Oct 14, 2024

Murrol commented Oct 29, 2024