Skip to content

DNA methylation in the honey bee and some other Hymenopterans

Notifications You must be signed in to change notification settings

littleblackfish/apis-methylation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains code used to re-analyse WGBS data from the honey bee (Apis mellifera) and some other insects. It exists primarily for reproducibility purposes; while it is not polished for public use, it may provide some useful components for re-use. Indeed, the structure herein was initially developed for a systematic review of DNA methylation patterns in the honey bee, then re-used to expand analysis into 5 more Hymenopterans, namely:

  • Bombus terrestris (buff-tailed bumblebee)
  • Nasonia vitripennis (jewel wasp)
  • Ooceraea biroi (clonal raider ant)
  • Harpegnathos saltator (Indian jumping ant)
  • Camponotus floridanus (Florida carpenter ant)

Data for each species is curated in json files such as this one.

Primary analysis of WGBS data was done with nf-core/methylseq.

A number of scripts exist to aid in heavy lifting. Mainly:

  • prep_run.py will parse a meta.json file and generate the directory structure to run nf-core/methylseq on all samples. (note that this assumes all fastq files are accessible through filesystem)
  • methylseq.pbs contains the parameters used for nf-core/methylseq execution within this structure.
  • submit.sh submits jobs for all samples within such a directory.
  • merge_bedgraph.py parses nf-core/methylseq output for all samples into two master matrices that contain entirety of experimental data. Cytosines in rows, samples in columns; total calls in one, methylated calls in the other.
  • significant.py filters those matrices for minimum coverage and tests for significant methylation against a null of incomplete bisulfite conversion using a binomial model.
  • build_index.py builds an index containing strand, context and annotation data for every cytosine in the genome to match the master matrices..

Secondary analysis is then contained in notebooks, and is entirely based on the output of last 2 scripts.

A complete, although not minimal conda environment that supports the entirety of analysis is provided.