Neural Discrete Representation Learning #23

flrngel · 2018-09-18T07:56:57Z

Abstract

paper proposes model(VQ-VAE) that learns "discrete representations"
differs from VAEs
- encode network outputs discrete (means not continuous)
- prior learnt than static
- circumvent issues of posterior collapse
  - latent ignored by decoder (typically observed by other VAEs)

usefulness of generic representations in unsupervised fashion is lack
model conserves the important features of the data in latent space while optimising for maximum likelihood
paper concentrate on representations
images can often be described concisely by language
paper most of VAE with discrete latent representations uses parameterization of the posterior distribution of observation but this paper relies on vector quantization
posterior collapse is that latents being ignored
can span many dimensions in data space

Models feature

simple and unsupervised
use discrete latent, not suffer from posterior collapse and has no variance issue
perform as well as continuous model
coherent and high quality on a wide variety

there are many alternatives for training discrete VAEs [23, 32]
Concrete distribution and Gumbel-softmax makes variance high but unbiased in the end of training
Scalar quantization compresses activations for lossy image compression before arithmetic encoding

Order

Encoder parameterises posterior distribution q(z|x) of discrete latent random variables z with data x
posteriors and priors in VAEs are assumed normally distributed with diagonal covariance, which allows for Gaussian re-parameterization trick to be used [32, 23]

capable of modeling very long term dependencies through compressed discrete latent space
VQ-VAEs capture important features

word discrete is by embedding(e_i) in VQ-VAE
training would be hard because we should consider hyperparameters in Loss function
tf.stop_gradient

flrngel added VAE Representation Learning labels Sep 18, 2018