-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Console API help * README.md * README.md * README.md
- Loading branch information
Showing
3 changed files
with
299 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,91 +1,306 @@ | ||
# BismarkPlot | ||
A small library to plot Bismark ``methylation_extractor`` reports. | ||
Comprehensive tool for visualizing genome-wide cytosine data. | ||
|
||
See the docs: https://shitohana.github.io/BismarkPlot | ||
|
||
Right now only ``coverage2cytosine`` input is supported, but support for ``bismark2bedGraph`` will be added soon. | ||
Right now only ``coverage2cytosine`` input is supported, but support for other input types will be added soon. | ||
|
||
## Installation | ||
# Installation | ||
|
||
```commandline | ||
pip install bismarkplot | ||
``` | ||
|
||
## Usage | ||
# Console usage | ||
You can use ```bismarkplot``` either as python library or directly from console after installing it. | ||
To see options: | ||
|
||
Console options: | ||
- *bismarkplot-metagene* - methylation density visualizing tool. | ||
- *bismarkplot-chrs* - chromosome methylation levels visualizing tool. | ||
|
||
### bismarkplot-metagene | ||
|
||
```commandline | ||
usage: BismarkPlot. [-h] [-o OUT] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH] [-w GWINDOWS] [--line] [--heatmap] [--box] [--violin] | ||
[-S SMOOTH] [-L LABELS [LABELS ...]] [-H H] [-V V] [--dpi DPI] [-F {png,pdf,svg}] | ||
filename [filename ...] | ||
Metagene visualizing tool. | ||
positional arguments: | ||
filename path to bismark methylation_extractor files | ||
options: | ||
-h, --help show this help message and exit | ||
-o OUT, --out OUT output base name (default: current/path) | ||
-g GENOME, --genome GENOME | ||
path to GFF genome file (default: None) | ||
-r {gene,exon,tss,tes}, --region {gene,exon,tss,tes} | ||
path to GFF genome file (default: gene) | ||
-b BATCH, --batch BATCH | ||
number of rows to be read from bismark file by batch (default: 1000000) | ||
-c CORES, --cores CORES | ||
number of cores to use (default: None) | ||
-f FLENGTH, --flength FLENGTH | ||
length in bp of flank regions (default: 2000) | ||
-u UWINDOWS, --uwindows UWINDOWS | ||
number of windows for upstream (default: 50) | ||
-d DWINDOWS, --dwindows DWINDOWS | ||
number of windows for downstream (default: 50) | ||
-m MLENGTH, --mlength MLENGTH | ||
minimal length in bp of gene (default: 4000) | ||
-w GWINDOWS, --gwindows GWINDOWS | ||
number of windows for genes (default: 100) | ||
--line line-plot enabled (default: False) | ||
--heatmap heat-map enabled (default: False) | ||
--box box-plot enabled (default: False) | ||
--violin violin-plot enabled (default: False) | ||
-S SMOOTH, --smooth SMOOTH | ||
windows for smoothing (default: 10) | ||
-L LABELS [LABELS ...], --labels LABELS [LABELS ...] | ||
labels for plots (default: None) | ||
-H H vertical resolution for heat-map (default: 100) | ||
-V V vertical resolution for heat-map (default: 100) | ||
--dpi DPI dpi of output plot (default: 200) | ||
-F {png,pdf,svg}, --format {png,pdf,svg} | ||
format of output plots (default: pdf) | ||
``` | ||
|
||
Example: | ||
|
||
```commandline | ||
bismarkplot --help | ||
bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000 -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt | ||
``` | ||
|
||
## Examples | ||
### Example in command line | ||
[Result](#multiple-samples-same-specie) | ||
|
||
### bismarkplot-chrs | ||
|
||
```commandline | ||
bismarkplot --genome path/to/genome.gff --flength 2000 --mlength 4000 --fwindows 500 --gwindows 2000 --line-plot --heat-map --box-plot --bar-plot --labels exp1 exp2 --resolution 100 --format pdf path/to/genomeWide1.txt path/to/genomeWide2.txt | ||
usage: BismarkPlot [-h] [-o DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] path/to/txt [path/to/txt ...] | ||
Chromosome methylation levels visualization. | ||
positional arguments: | ||
path/to/txt path to bismark methylation_extractor file | ||
options: | ||
-h, --help show this help message and exit | ||
-o DIR, --out DIR output base name (default: current/path) | ||
-b N, --batch N number of rows to be read from bismark file by batch (default: 1000000) | ||
-c CORES, --cores CORES | ||
number of cores to use (default: None) | ||
-w N, --wlength N number of windows for genes (default: 100000) | ||
-m N, --mlength N minimum chromosome length (default: 1000000) | ||
-S FLOAT, --smooth FLOAT | ||
windows for smoothing (0 - no smoothing, 1 - straight line (default: 50) | ||
-F {png,pdf,svg}, --format {png,pdf,svg} | ||
format of output plots (default: pdf) | ||
``` | ||
This command will generate line plots and heat maps pdf files for all contexts and box and bar plots. | ||
|
||
### Example in python script | ||
If using bismarkplot as python library is more suitable for you, you can call ```bismarkplot``` methods right from python script. | ||
Example: | ||
|
||
```commandline | ||
bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt | ||
``` | ||
|
||
[Result](#chromosome-levels) | ||
|
||
# Python | ||
|
||
BismarkPlot provides a large variety of function for manipulating with cytosine methylation data. | ||
|
||
## Metagene | ||
|
||
Below we will show the basic BismarkPlot workflow. | ||
|
||
### Single sample | ||
|
||
First we need to initialize ``genome`` and ``BismarkFiles``. ``genome`` is .gff or .bed file with gene coordinates. ``BismarkFiles`` is a class, which calculates data for all plots, so their types need to be specified when it is initialized. | ||
```python | ||
import bismarkplot | ||
# Firstly, we need to read the regions annotation (e.g. reference genome .gff) | ||
genome = bismarkplot.Genome.from_gff("path/to/genome.gff") | ||
# Next we need to filter regions of interest from the genome | ||
genes = genome.gene_body(min_length=4000, flank_length=2000) | ||
|
||
file = 'path/to/genome.gff' | ||
|
||
genome = bismarkplot.read_genome( | ||
file, | ||
flank_length=2000, | ||
min_length=4000 | ||
# Now we need to calculate metagene data | ||
metagene = bismarkplot.Metagene.from_file( | ||
file = "path/to/CX_report.txt", | ||
genome=genes, # filtered regions | ||
upstream_windows = 500, | ||
gene_windows = 1000, | ||
downstream_windows = 500, | ||
batch_size= 10**7 # number of lines to be read simultaneously | ||
) | ||
|
||
files = ['path/to/genomeWide1.txt', 'path/to/genomeWide2.txt'] | ||
bismark = bismarkplot.BismarkFiles( | ||
files, genome, | ||
flank_windows=500, | ||
gene_windows=2000, | ||
line_plot=True, | ||
heat_map=True, | ||
box_plot=True, | ||
bar_plot=True | ||
) | ||
# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr) | ||
filtered = metagene.filter(context = "CG", strand = "+") | ||
# We are ready to plot | ||
lp = filtered.line_plot() # line plot data | ||
lp.draw().savefig("path/to/lp.pdf") # matplotlib.Figure | ||
|
||
hm = filtered.heat_map(ncol=200, nrow=200) | ||
hm.draw().savefig("path/to/hm.pdf") # matplotlib.Figure | ||
``` | ||
Output: | ||
|
||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546389-8b97edcb-6ab4-4f17-a970-389819fbeaec.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546419-079e004b-8f6e-4ce9-a3dc-fc4ad9592adc.png" width="300"> | ||
</p> | ||
|
||
Let's now draw plots themselves. | ||
### Smoothing the line plot | ||
|
||
Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts) | ||
|
||
For line plots use (or ``draw_line_plots_all`` for all contexts) | ||
```python | ||
bismark.draw_line_plots_filtered( | ||
context='CG', | ||
strand='+', | ||
smooth=.05, | ||
labels = ['exp1', 'exp2'], | ||
title = 'Plot for CG+' | ||
) | ||
# mouse CHG methylation example | ||
filtered = metagene.filter(context = "CHG", strand = "+") | ||
lp.draw(smooth = 0).savefig("path/to/lp.pdf") # no smooth | ||
lp.draw(smooth = 50).savefig("path/to/lp.pdf") # smoothed with window length = 50 | ||
``` | ||
|
||
<img alt="Plot for CG+" src="https://user-images.githubusercontent.com/43905117/236703691-023818e9-fb0d-47e6-a328-a712c9285928.png" width="" height="400"/> | ||
Output: | ||
|
||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274557328-5a087a43-5731-4cef-aa90-cf2ce046c747.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274557346-97e10689-609c-4032-a14d-5893b6797d59.png" width="300"> | ||
</p> | ||
|
||
### Multiple samples, same specie | ||
|
||
For heat maps use (or ``draw_heat_maps_all`` for all contexts) | ||
```python | ||
bismark.draw_heat_maps_filtered( | ||
context='CG', | ||
strand='+', | ||
resolution=100, | ||
labels = ['exp1', 'exp2'], | ||
title = 'Heatmap for CG+' | ||
) | ||
# We can initialize genome like in previous example | ||
|
||
filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"] | ||
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...) # rest of params from previous example | ||
|
||
# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr) | ||
filtered = metagenes.filter(context = "CG", strand = "+") | ||
|
||
# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below | ||
trimmed = filtered.trim_flank() # we want to analyze only gene bodies | ||
trimmed.box_plot(showfliers=False).savefig(...) | ||
trimmed.violin_plot().savefig(...) | ||
|
||
# If data is technical replicates we can merge them into single DataFrame and analyze as one | ||
merged = filtered.merge() | ||
``` | ||
|
||
Output: | ||
|
||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546531-8516858a-8203-4e98-98a9-7351efb79d29.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546553-f2617948-4d74-4f1e-9543-e4fff49deae7.png" width="300"> | ||
</p> | ||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546624-9a2da41b-5c3b-4f65-baee-29086a40e020.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274546690-83757110-83cc-4f5f-ad97-b233faa54b97.png" width="300"> | ||
</p> | ||
|
||
### Multiple samples, multiple species | ||
|
||
```python | ||
# For analyzing samples with different reference genomes, we need to initialize several genomes instances | ||
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"] | ||
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"] | ||
|
||
genomes = [ | ||
bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames | ||
] | ||
|
||
# Now we read reports | ||
metagenes = [] | ||
for report, genome in zip(reports_filenames, genomes): | ||
metagene = bismarkplot.Metagene(report, genome = genome, ...) | ||
metagenes.append(metagene) | ||
|
||
# Initialize MetageneFiles | ||
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"] | ||
metagenes = Bismarkplot.MetageneFiles(metagenes, labels) | ||
# Now we can plot them like in previous example | ||
``` | ||
|
||
Output: | ||
|
||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274552095-bdb87510-9093-4092-8b30-db71ec8ef12d.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274552066-a26350e8-8f66-4ffd-8a24-a0882051149a.png" width="300"> | ||
</p> | ||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274552038-641ac683-b43f-4a6a-8636-dd32f7226f28.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274552121-d28949f3-cb6c-48b2-8f6d-81043aed7c13.png" width="300"> | ||
</p> | ||
|
||
### Different regions | ||
|
||
Other genomic regions from .gff can be analyzed too with ```.exon``` or ```.near_tss/.near_tes``` option for ```bismarkplot.Genome``` | ||
|
||
```python | ||
exons = [ | ||
bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames | ||
] | ||
metagenes = [] | ||
for report, exon in zip(reports_filenames, exons): | ||
metagene = bismarkplot.Metagene(report, genome = exon, | ||
upstream_windows = 0, # !!! | ||
downstream_windows = 0, # !!! | ||
...) | ||
metagenes.append(metagene) | ||
# OR | ||
tss = [ | ||
bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames | ||
] | ||
metagenes = [] | ||
for report, t in zip(reports_filenames, tss): | ||
metagene = bismarkplot.Metagene(report, genome = t, | ||
upstream_windows = 1000,# same number of windows | ||
gene_windows = 1000, # same number of windows | ||
downstream_windows = 0, # !!! | ||
...) | ||
metagenes.append(metagene) | ||
``` | ||
|
||
<img alt="Heatmap for CG+" height="300" src="https://user-images.githubusercontent.com/43905117/236703690-b46c7579-3068-4e98-82f0-9a6435c7808b.png"/> | ||
Exon output: | ||
|
||
<p float="left" align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274564386-767d8bea-87c8-41c5-b43f-bbb93c987844.png" width="300"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274564376-1c662e9b-4194-443a-9f83-5e92bf2387cc.png" width="300"> | ||
</p> | ||
|
||
TSS output: | ||
<p align="middle"> | ||
<img src="https://user-images.githubusercontent.com/43905117/274552171-40be1461-9907-4d16-a6d3-a44ad53178ea.png" width="300"> | ||
</p> | ||
|
||
## Chromosome levels | ||
|
||
BismarkPlot allows user to visualize chromosome methylation levels across full genome | ||
|
||
For box plot or bar plot use | ||
```python | ||
bismark.draw_box_plot(strand_specific=True, labels=['exp1', 'exp2']) | ||
bismark.draw_bar_plot(labels=['exp1', 'exp2']) | ||
import bismarkplot | ||
chr = bismarkplot.ChrLevels.from_file( | ||
"path/to/CX_report.txt", | ||
window_length=10**5, # window length in bp | ||
batch_size=10**7, | ||
chr_min_length = 10**6, # minimum chr length in bp | ||
) | ||
fig, axes = plt.subplots() | ||
|
||
for context in ["CG", "CHG", "CHH"]: | ||
chr.filter(strand="+", context=context).draw( | ||
(fig, axes), # to plot contexts on same axes | ||
smooth=10, # window number for smoothing | ||
label=context # labels for lines | ||
) | ||
|
||
fig.savefig(f"chrom.pdf", dpi = 200) | ||
``` | ||
<img alt="box_05_07_23:54.png" height="400" src="https://user-images.githubusercontent.com/43905117/236703689-9eaaa28a-1a98-4300-a0d0-83039ed9a541.png"/> | ||
<img alt="bar_05_07_23:54.png" height="400" src="https://user-images.githubusercontent.com/43905117/236703687-f3fd1225-1ad1-45b0-9318-b2282a694e68.png"/> | ||
|
||
Output for Arabidopsis t.: | ||
|
||
<img src="https://user-images.githubusercontent.com/43905117/274563188-6efc5b71-9c83-4fe0-8b5a-767db6e1acb4.png"> | ||
|
||
Output for Brachypodium d.: | ||
|
||
<img src="https://user-images.githubusercontent.com/43905117/274563210-4f5dc20a-4ab3-4e52-8263-6ebe7b0623d5.png"> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.