Skip to content

Hellmann-Lab/Evidence-for-compensatory-evolution-within-pleiotropic-regulatory-elements

Repository files navigation

Evidence for compensatory evolution within pleiotropic regulatory elements

DOI

This repository contains the code to reproduce the analysis for the manuscript

Evidence for compensatory evolution within pleiotropic regulatory elements

by Zane Kliesmete, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, Ines Hellmann

The data necessary to reproduce this analysis can be found on ArrayExpress:

Accession Dataset
E-MTAB-13494 RNA-seq data from human and cynomolgus macaque
E-MTAB-13373 ATAC-seq data from human and cynomolgus macaque

1. Re-analyzing the DNase-seq and RNA-seq data from Roadmap Epigenomics Project

As a part of the study, we use published data to quantify the Pleiotropic Degree (PD) for nearly 0.5 million CREs accessible in at least one of the following nine human fetal tissues: adrenal gland, brain, heart, kidney, large intestine, lung, muscle, stomach and thymus. We furthermore associate these CREs to expressed genes in the respective tissue and model the importance of different CRE properties on gene expression levels using a mixed-effects linear model. The relevant analysis scripts for this part, underlying Figure 1 and Supplemental Figures S1, S2 are the following:

DHS peak calling
DHS peak filtering
DHS peak analyses
Expression data preparation
CRE to gene association
Mixed-effects model fitting and permutation
Generate Figure 1

2. Cross-species accessibility and expression analyses

In this study, we generate data on human and cynomolgous macaque gene expression and accessibility. To have comparable annotation between species, we use liftOff to generate a gtf file for the macFas6 genome. We then process the RNA-seq data using our tool zUMIs and ATAC-seq data using Genrich and investigate differential expression and accessibility associated with CREs from different PDs (Figure 3, Supplemental Figure S3).

Run liftOff
Process liftOff output
Analyse cross-species gene expression
Identify orthologous peaks
Re-analyse CRE PD and activity conservation across mammals from Roller et al. 2021
Do the integrated analyses

3. Sequence conservation

We use INSIGHT to quantify the selection acting on the CREs between human MRCA vs outgroup MRCA and within humans for each PD class and it’s subsets. We also use phyloP and phastCons to investigate CRE conservation across 10 primate species (Figures 4, 5 and Supplemental Figure S4).

Run sequence conservation methods
Summarize sequence conservation

4. Transcription factor binding site analyses

We quantified TFBS repertoire and it’s conservation between human and cynomolgus macaque across >90% of all CREs in this study. First, sequences were extracted and provided to Cluster-Buster along with expressed TF position weight matrices to identify their binding positions. For each CRE, it’s repertoire similarity was measured. Furthermore, all orthologous sequence binding sites were aligned between species and their positional conservation was quantified. The most important scripts to generate Figures 2, 6 and Supplemental Figures S5, S7 are listed below, more intermediate processing scripts can be found here.

Extract orthologous sequences for TFBS analyses
Quantify TFBS repertoire across PDs
Combine different conservation measures
Re-analyse TFBS conservation across mammals from Ballester et al. 2014
Generate main figure

5. Example promoter analysis

Finally, we visualized the case for a pleiotropic ATAXIN-3 gene promoter as a representative example for low sequence and TFBS position, but high functional conservation in terms of TFBS repertoire, CRE accesibility and downstream gene expression (Figure 7).

Generate main figure

Throughout the workflow, we are using job scheduling system slurm (v0.4.3).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published