Skip to content

Commit

Permalink
README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
shitohana committed Oct 12, 2023
1 parent 3995113 commit 6afbf7a
Showing 1 changed file with 159 additions and 6 deletions.
165 changes: 159 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,14 @@ options:
format of output plots (default: pdf)
```

Example:

```commandline
bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000 -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt
```

[Result](#multiple-samples-same-specie)

### bismarkplot-chrs

```commandline
Expand All @@ -90,11 +98,19 @@ options:
format of output plots (default: pdf)
```

Example:

```commandline
bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt
```

[Result](#chromosome-levels)

# Python

BismarkPlot provides a large variety of function for manipulating with cytosine methylation data.

## Basic workflow
## Metagene

Below we will show the basic BismarkPlot workflow.

Expand Down Expand Up @@ -124,30 +140,167 @@ lp = filtered.line_plot() # line plot data
lp.draw().savefig("path/to/lp.pdf") # matplotlib.Figure

hm = filtered.heat_map(ncol=200, nrow=200)
lp.draw().savefig("path/to/hm.pdf") # matplotlib.Figure
hm.draw().savefig("path/to/hm.pdf") # matplotlib.Figure
```
Output:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274546389-8b97edcb-6ab4-4f17-a970-389819fbeaec.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274546419-079e004b-8f6e-4ce9-a3dc-fc4ad9592adc.png" width="300">
</p>

### Smoothing the line plot

Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts)

```python
# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf") # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf") # smoothed with window length = 50
```

Output:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274557328-5a087a43-5731-4cef-aa90-cf2ce046c747.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274557346-97e10689-609c-4032-a14d-5893b6797d59.png" width="300">
</p>

### Multiple samples, same specie

```python
# We can initialize genome like in previous example

filenames = ["report1.txt", "report2.txt", "report3.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["rep1", "rep2", "rep3"], ...) # rest of params from previous example
filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...) # rest of params from previous example

# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagenes.filter(context = "CG", strand = "+")

# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below
trimmed = filtered.trim_flank() # we want to analyze only gene bodies
trimmed.box_plot(showfliers=True).savefig(...)
trimmed.box_plot(showfliers=False).savefig(...)
trimmed.violin_plot().savefig(...)

# If data is technical replicates we can merge them into single DataFrame and analyze as one
merged = filtered.merge()
```
```

Output:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274546531-8516858a-8203-4e98-98a9-7351efb79d29.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274546553-f2617948-4d74-4f1e-9543-e4fff49deae7.png" width="300">
</p>
<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274546624-9a2da41b-5c3b-4f65-baee-29086a40e020.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274546690-83757110-83cc-4f5f-ad97-b233faa54b97.png" width="300">
</p>

### Multiple samples, multiple species

```python
# For analyzing samples with different reference genomes, we need to initialize several genomes instances
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]

genomes = [
bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]

# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
metagene = bismarkplot.Metagene(report, genome = genome, ...)
metagenes.append(metagene)

# Initialize MetageneFiles
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"]
metagenes = Bismarkplot.MetageneFiles(metagenes, labels)
# Now we can plot them like in previous example
```

Output:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274552095-bdb87510-9093-4092-8b30-db71ec8ef12d.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274552066-a26350e8-8f66-4ffd-8a24-a0882051149a.png" width="300">
</p>
<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274552038-641ac683-b43f-4a6a-8636-dd32f7226f28.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274552121-d28949f3-cb6c-48b2-8f6d-81043aed7c13.png" width="300">
</p>

### Different regions

Other genomic regions from .gff can be analyzed too with ```.exon``` or ```.near_tss/.near_tes``` option for ```bismarkplot.Genome```

```python
exons = [
bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
metagene = bismarkplot.Metagene(report, genome = exon,
upstream_windows = 0, # !!!
downstream_windows = 0, # !!!
...)
metagenes.append(metagene)
# OR
tss = [
bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
metagene = bismarkplot.Metagene(report, genome = t,
upstream_windows = 1000,# same number of windows
gene_windows = 1000, # same number of windows
downstream_windows = 0, # !!!
...)
metagenes.append(metagene)
```

Exon output:

<p float="left" align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274564386-767d8bea-87c8-41c5-b43f-bbb93c987844.png" width="300">
<img src="https://user-images.githubusercontent.com/43905117/274564376-1c662e9b-4194-443a-9f83-5e92bf2387cc.png" width="300">
</p>

TSS output:
<p align="middle">
<img src="https://user-images.githubusercontent.com/43905117/274552171-40be1461-9907-4d16-a6d3-a44ad53178ea.png" width="300">
</p>

## Chromosome levels

BismarkPlot allows user to visualize chromosome methylation levels across full genome

```python
import bismarkplot
chr = bismarkplot.ChrLevels.from_file(
"path/to/CX_report.txt",
window_length=10**5, # window length in bp
batch_size=10**7,
chr_min_length = 10**6, # minimum chr length in bp
)
fig, axes = plt.subplots()

for context in ["CG", "CHG", "CHH"]:
chr.filter(strand="+", context=context).draw(
(fig, axes), # to plot contexts on same axes
smooth=10, # window number for smoothing
label=context # labels for lines
)

fig.savefig(f"chrom.pdf", dpi = 200)
```

Output for Arabidopsis t.:

<img src="https://user-images.githubusercontent.com/43905117/274563188-6efc5b71-9c83-4fe0-8b5a-767db6e1acb4.png">

Output for Brachypodium d.:

<img src="https://user-images.githubusercontent.com/43905117/274563210-4f5dc20a-4ab3-4e52-8263-6ebe7b0623d5.png">

0 comments on commit 6afbf7a

Please sign in to comment.