Gene Expression Model

Background

The notebooks in this repository provide a systematic procedure for applying machine learning algorithms to predict gene expression in human and mouse cells. The multivariate regression models train on a set of electrophysiological features and are evaluated on their accuracy in reproducing the corresponding cell's gene expression values. One model is created per gene. The imbalance in the small number of samples is accounted for through the use of penalties.

Data

The training dataset data/online_table3.csv contains electrophysiological features collected by the Allen Institute of Brain Science, along with gene expression values. This table was used for training and evaluation. The produced models were applied to data/aibs_aggregated_ephys_v6.csv which contains electrophysiological features for 1058 cells. Due to a number of cells containing missing values, only 888 cells were used.

The models are stored as Python dictionaries in 'lasso_models.sav`. Each dictionary contains the name of the gene, the model, and the model's standard deviation, r^2, and root-mean-squared-error values. The file contains 12205 models.

Build and train models

Relating Ephys Features to Gene Expression.ipynb uses SciKit-Learn to compare performances between Linear Regression and Lasso. Various outcomes are displayed as an array of plots and histograms. Results are analyzed and applied to evaluate the models on a numerical and graphical basis.

Make predictions

Predicting Gene Expression with Lasso Models.ipynb applies the models built previously, and creates a new .csv file with the predicted genes.

Validate results

Validating Predictions.ipynb compares the results with a gene expression counts matrix from the GSE71585_RefSeq_counts.csv found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71585. Both t-SNE and UMAP are used as dimensionality reduction tools to visualize the two datasets qualitatively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene Expression Model

Background

Data

Build and train models

Make predictions

Validate results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
Predicting Gene Expression with Lasso Models.ipynb		Predicting Gene Expression with Lasso Models.ipynb
README.md		README.md
Relating Ephys Features to Gene Expression.ipynb		Relating Ephys Features to Gene Expression.ipynb
Validating Predictions.ipynb		Validating Predictions.ipynb
lasso_models.sav		lasso_models.sav

youngseok-seo/GeneExpression

Folders and files

Latest commit

History

Repository files navigation

Gene Expression Model

Background

Data

Build and train models

Make predictions

Validate results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages