Variational Autoencoder applied to genome-scale DNA methylation data

This repository is adapted from the Tybalt repository developed by Way et al. This repository is a fork of the previously mentioned repo, but due to a technical error, it was uploaded to GitHub independently. Some of the description below is addapted from the original Way et al. repo.

Alexander Titus 2018 -- extended from Gregory Way 2017

This script trains and outputs results for a variational autoencoder (VAE) applied to genome-scale breast cancer methylation data measured using the Illumina 450K bead array. A VAE is an unsupervised learning approach that given input data uses data generation to approximate a lower dimensional representation of the input. Whether the lower dimensional representation (or manifold) of the input can contribute to improved understanding of cancer biology will require comparison to other approaches. Nonetheless, opportunities to interrogate the latent space in the VAE generative model also motivate the application of these methods to high-dimensional data such as DNA methylation data.

The model trained here used methylation data measured with the 450K Illumina array that was subset to the 100,000 CpGs whose methylation was most variable (median absolute deviation). The VAE approach then sampled a noise vector to reparameterize the encoded space and compress to vectors of length 100 for mean and variance. The 100 encoded layers can then be used to reconstruct the original input data. The encoding scheme uses relu activations and the decoder uses a sigmoid activation to enforce positive activations. All weights are glorot uniform initialized.

To encourage manifold learning, the model also uses warm start as discussed in Sonderby et al. 2016 and an added parameter beta, to control KL divergence loss that is contributed to total VAE loss (reconstruction + (beta * KL)). In this setting, the model begins training deterministically as a plain autoencoder (beta = 0) and slowly ramps up after each epoch linearly to beta = 1. After a parameter sweep, Way et al. observed that kappa has little influence in training, therefore, we set kappa = 1, which is a full VAE.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
data		data
figures/param_sweep		figures/param_sweep
results		results
scripts		scripts
tybalt		tybalt
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
VennDiagram2018-09-16_17-21-47.log		VennDiagram2018-09-16_17-21-47.log
VennDiagram2018-09-16_17-22-20.log		VennDiagram2018-09-16_17-22-20.log
adage_training.ipynb		adage_training.ipynb
environment.yml		environment.yml
gpu-environment.yml		gpu-environment.yml
human_c2_v5p2.rdata		human_c2_v5p2.rdata
my_mh.R		my_mh.R
param_sweep_parameter_z.sh		param_sweep_parameter_z.sh
parameter_sweep.md		parameter_sweep.md
process_data.ipynb		process_data.ipynb
simulation_results.md		simulation_results.md
tsne_tybalt_features.ipynb		tsne_tybalt_features.ipynb
tybalt_vae.ipynb		tybalt_vae.ipynb
z_dimensions_hyperparameter_sweep_results.md		z_dimensions_hyperparameter_sweep_results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variational Autoencoder applied to genome-scale DNA methylation data

Alexander Titus 2018 -- extended from Gregory Way 2017

About

Releases

Packages

Languages

License

Christensen-Lab-Dartmouth/VAE_methylation

Folders and files

Latest commit

History

Repository files navigation

Variational Autoencoder applied to genome-scale DNA methylation data

Alexander Titus 2018 -- extended from Gregory Way 2017

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages