Skip to content

Latest commit

 

History

History
72 lines (45 loc) · 3.6 KB

DRAW.md

File metadata and controls

72 lines (45 loc) · 3.6 KB

DRAW: A recurrent neural network for image generation

Main Idea

Combine RNN and the attention mechanism into variational autoencoder to generate images.

Architechture

DRAW

Algorithm

Whole Algorithm:

For each time step:

  1. Suppose x is the input image, first use read in attention mechanism to choose which patch of the image should be input into the Encoder RNN.

  2. The Encoder RNN emits hidden state h_enc

  3. Use h_enc to generate parameters of distribution Q(z) ~ N(z|mu,sigma) ans sample z:

    z param

    z param

  4. Input sampled z into the Decoder RNN to get h_dec

  5. Input h_dec into write in attention mechanism and add output to the canvas c_t.

  6. Use the canvas from the last time step T to generate paramters of the distribution of pixels: D(x|c_T), here this distribution is Bernoulli, so sigmoid(c_T) is the mean of this distribution.

Read and Write Attention Mechanism:

Read and Write without attention

In this case, the input is just the whole image, and the output of the Decoder RNN modifies the whole canvas at each time step.

Selective Attention Model

In this case, the author impose a N*N grid of Gaussian filters on top of the image, and get parameters of these Gaussian filters from the hidden state of the Decoder RNN: Then use these Gaussian filters to choose which part of the image being input into the Encoder RNN and which part of canvas is modified.

  1. Generate paramters of the Gaussian filters:

  1. Then the Gaussian filters can be written as:

  1. Thus, we have read and write equations like this:

Loss function

The Loss function is the same as Variational Autoencoder. It also consists of two parts: the reconstruction error and the regularization error.

Regularization Error

Assuming z~N(0,1), then the regularization error is the same as the one in VAE:

Reconstruction Error

The reconstruction error is the negative log probability of x under Bernoulli distribution, whose mean is given by sigmoid(c_T):

Reference

Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for image generation[J]. arXiv preprint arXiv:1502.04623, 2015.