Questions about reconstruction and adversarial losses #390

lukau2357 · 2025-01-30T17:38:01Z

AutoencoderKL ends up using L1 reconstruction error instead of L2 reconstruction error during training, which does not coincide with classical VAEs theoretically - data likelihood conditioned on latents $p(x | z)$ is Gaussian so taking the log of Gaussian PDF gives L2 reconstruction error, up to a scaling factor. (https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L48)
Same question for VQ-GAN, although I can see that it supports both L2 and L1 regimes, with L1 being the default one. https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L103
Generator loss for both autoencoders is computed as -torch.mean(logits_fake). (https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L123), https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L71). Correct me if I'm wrong, but I think this corresponds to generator loss under WGAN framework, but discriminator loss only supports non-saturating vanilla discriminator loss and hinge discriminator loss. https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L27), https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L73, https://github.com/CompVis/taming-transformers/blob/master/taming/modules/losses/vqperceptual.py#L20

The text was updated successfully, but these errors were encountered:

Provide feedback