Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/adeschen/similaRpeak
Browse files Browse the repository at this point in the history
  • Loading branch information
adeschen committed Mar 6, 2015
2 parents 008fe3e + cd62315 commit 9a53366
Show file tree
Hide file tree
Showing 9 changed files with 107 additions and 48 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ Depends:
R6 (>= 2.0)
Imports:
rtracklayer,
GenomicAlignments
GenomicAlignments,
Rsamtools
Suggests:
RUnit,
BiocGenerics,
Expand Down
3 changes: 2 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ export(similarity)
export(MetricFactory)
import(R6)
importFrom(rtracklayer, import)
importFrom(GenomicAlignments, readGAlignments)
importFrom(GenomicAlignments, readGAlignments)
importFrom(Rsamtools, ScanBamParam)
Binary file added inst/extdata/align1.bam
Binary file not shown.
Binary file added inst/extdata/align1.bam.bai
Binary file not shown.
4 changes: 4 additions & 0 deletions man/MetricFactory.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ ratio_area <- factory$createMetric(metricType="RATIO_AREA",
profile1=demoProfiles$chr3.73159773.73160145$H3K4me1,
profile2=demoProfiles$chr3.73159773.73160145$H3K27ac)
ratio_area


## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.
}
\seealso{
\itemize{
Expand Down
11 changes: 9 additions & 2 deletions man/chr7Profiles.Rd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\name{chr7Profiles}
\alias{chr7Profiles}
\docType{data}
\alias{chr7Profiles}
\title{
ChIP-Seq profiles of region chr7:61968807-61969730 related to enhancers
H3K27ac and H3K4me1 (for demonstration purpose)
Expand Down Expand Up @@ -36,13 +36,16 @@ the Encyclopedia of DNA Elements (ENCODE) data (Dunham I et al. 2012).
}
}
\source{
The Encyclopedia of DNA Elements (ENCODE) data
}
\references{
Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA
elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
}
\examples{
data(chr7Profiles)

## Calculating all metrics for the chr7.61968807.61969730 region
## Calculating all metrics for the "chr7.61968807.61969730" region
metrics <- similarity(chr7Profiles$chr7.61968807.61969730$H3K4me1,
chr7Profiles$chr7.61968807.61969730$H3K27ac,
ratioAreaThreshold=10,
Expand All @@ -53,6 +56,10 @@ metrics <- similarity(chr7Profiles$chr7.61968807.61969730$H3K4me1,
diffPosMaxThresholdMaxDiff=100,
diffPosMaxTolerance=0.10)
metrics


## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.
}\seealso{
\itemize{
\item \code{\link{MetricFactory}} {for using a interface to calculate
Expand Down
18 changes: 11 additions & 7 deletions man/demoProfiles.Rd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\name{demoProfiles}
\alias{demoProfiles}
\docType{data}
\alias{demoProfiles}
\title{
Selected ChIP-seq profiles related to enhancers
H3K27ac and H3K4me1 (for demonstration purpose)
Expand Down Expand Up @@ -49,19 +49,23 @@ H3K4me1 (DCC accession: ENCFF000ARY) from the Encyclopedia of DNA Elements
chr3:73160145 }
}
}
\details{
%% ~~ If necessary, more details than the __description__ above ~~
}
\source{
Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA
elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
The Encyclopedia of DNA Elements (ENCODE) data
}
\references{
%% ~~ possibly secondary sources and usages ~~
Dunham I, Kundaje A, Aldred SF, et al. \emph{An integrated encyclopedia of DNA
elements in the human genome.} Nature. 2012 Sep 6;489(7414):57-74.
}
\examples{
data(demoProfiles)

# Calculate metrics for the "chr3:73159773-73160145" region
metrics <- similarity(demoProfiles$chr3.73159773.73160145$H3K27ac,
demoProfiles$chr3.73159773.73160145$H3K4me1)
metrics

## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.
}
\seealso{
\itemize{
Expand Down
2 changes: 1 addition & 1 deletion man/similarity.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ similarity(demoProfiles$chr2.70360770.70361098$H3K4me1,
spearmanCorrSDThreashold=0.1)


## You can refer to the vignette to see more exemples using ChIP-Seq profiles
## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.
}
\seealso{
Expand Down
114 changes: 78 additions & 36 deletions vignettes/similaRpeak.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -76,23 +76,63 @@ DNA sequencing to identify the binding sites of DNA-associated proteins. For a
specific region, the read count of aligned sequences at each position of the
region is used to generate the ChIP-Seq profile for the region.

From a BAM file where aligned sequences are stored, the coverage for a specific
region can be extracted.
Aligned sequences are usually stored in BAM files. As example, a slimmed BAM
file (align1.bam) is selected as well as a specific region
(chr1:172081530-172083529). Using BAM file and
region information, represented by as a `GRanges` object, the coverage for the
specified region is extracted using specialized Bioconductor packages.

```{r bam_extract}
```{r bam_extract_coverage, collapse=TRUE}
suppressMessages(library(GenomicAlignments))
suppressMessages(library(rtracklayer))
suppressMessages(library(Rsamtools))
bamFile <- system.file("extdata/align1.bam", package = "similaRpeak")
regions <- system.file("extdata/list1.bed", package = "similaRpeak")
regions <- import(regions)
param <- ScanBamParam(which = regions)
alignments <- readGAlignments(bamFile, param = param)
coverages <- coverage(alignments)
coverages <- coverages[regions]
coverages <- lapply(coverages, as.numeric)
bamFile01 <- system.file("extdata/align1.bam", package = "similaRpeak")
region <- GRanges(Rle(c("chr1")), IRanges(start = 172081530, end = 172083529),
strand= Rle(strand(c("*"))))
param <- ScanBamParam(which = region)
alignments01 <- readGAlignments(bamFile01, param = param)
coveragesRegion01 <- coverage(alignments01)[region]
str(coveragesRegion01)
```

The `coverages01` can
easily be transformed to a vector of numerical value to obtain the raw
ChIP-Seq profile for the selected region.

```{r iranges_to_vector, collapse=TRUE}
coveragesRegion01 <- lapply(coveragesRegion01, as.numeric)
str(coveragesRegion01)
```

To convert the raw coverage to reads per million (RPM), the total number of
reads present in the BAM file is needed to assign a weight at the `coverage`
function.

```{r bam_count, collapse=TRUE}
param <- ScanBamParam(flag = scanBamFlag(isUnmappedQuery=FALSE))
count01 <- countBam(bamFile01, param = param)$records
coveragesRPMRegion01 <- coverage(alignments01, weight=1000000/count01)[region]
coveragesRPMRegion01 <- lapply(coveragesRPMRegion01, as.numeric)
str(coveragesRPMRegion01)
```

The read per millions values are quite low for the `coveragesRPMRegion01`
because the original BAM file has been reduced in size to simplify the example.

Other examples are available on the worflows section of the
[Bioconductor website](http://www.bioconductor.org/help/workflows/high-throughput-sequencing/
"high-throughput-sequencing").

Finally, the
[metagene package](http://bioconductor.org/packages/release/bioc/html/metagene.html)
, available on [Bioconductor](http://bioconductor.org), can also be used to
generate ChIP-Seq profiles. An example is available on
[metagene wiki](https://github.com/CharlesJB/metagene/wiki/Extract-ChIP-Seq-profiles-using-metagene).

### Threshold variables

Expand Down Expand Up @@ -150,14 +190,14 @@ By using the absolute value of the **DIFF_POS_MAX** metric, the definition of
a pseudometric is formally respected. However, the respective position of the
maximum peak of profiles is lost.

$$ \Huge |DIFF\_POS\_MAX| $$
$$ |DIFF\_POS\_MAX| $$

By using the absolute value of the logarithm of the
**RATIO_AREA**, **RATIO_MAX_MAX**, **RATIO_INTERSECT** and
**RATIO_NORMALIZED_INTERSECT** metrics, the definition of a pseudometric is
formally respected.

$$ \Huge \log(RATIO) $$
$$ |\log(RATIO)| $$



Expand Down Expand Up @@ -187,6 +227,9 @@ profile (`profile1` parameter) is always divided by the second profile
(`ratioAreaThreshold` parameter) is not respected for at least one of the
profiles.

The **RATIO_AREA** metric can be useful to detect regions with similar
coverage even if the profiles are different.

```{r ratio_area_graph, echo=FALSE, fig.width=11, fig.height=8}
par(mar=c(6,4,2,2))
par(mfrow=c(2,2))
Expand Down Expand Up @@ -254,8 +297,6 @@ metrics <- similarity(demoProfiles$chr19.27739373.27739767$H3K27ac,
metrics$metric$RATIO_AREA
```

The **RATIO_AREA** metric can be useful to detect regions with similar
coverage even if the profiles are different.

## DIFF_POS_MAX Metric

Expand All @@ -275,6 +316,8 @@ maximum value is egal or superior to a minimal value. When this minimum value
is not respected, it is assumed that no peak is present in the profile and
`NA` is returned.

The **DIFF_POS_MAX** metric can be useful to detect regions with shifted peaks.

```{r diff_pos_max, echo=FALSE, fig.width=11, fig.height=8}
par(mar=c(6,4,2,2))
Expand Down Expand Up @@ -344,8 +387,6 @@ metrics$metric$DIFF_POS_MAX
```


The **DIFF_POS_MAX** metric can be useful to detect regions with shifted peaks.

## RATIO_MAX_MAX Metric

The **RATIO_MAX_MAX** metric is the ratio between the peaks values
Expand All @@ -354,6 +395,9 @@ the second profile (`profile2` parameter). `NA` if minimal peak value threshold
(`ratioMaxMaxThreshold` parameter) is not respected for at least one of the
profiles.

The **RATIO_MAX_MAX** metric can be useful to detect regions with peaks with
similar (or dissimilar) amplitude.

```{r ratio_max_max, echo=FALSE, fig.width=11, fig.height=8}
par(mar=c(6,4,2,2))
Expand Down Expand Up @@ -422,8 +466,6 @@ metrics <- similarity(demoProfiles$chr19.27739373.27739767$H3K27ac,
metrics$metric$RATIO_MAX_MAX
```

The **RATIO_MAX_MAX** metric can be useful to detect regions with peaks with
similar (or dissimilar) amplitude.

## RATIO_INTERSECT Metric

Expand All @@ -432,6 +474,9 @@ the total area of the two profiles. `NA` if minimal area threshold
(`ratioIntersectThreshold` parameter) is not respected for the intersection
area.

The **RATIO_INTERSECT** metric can be useful to detect regions with possibily
similar profiles.

```{r ratio_intersect, echo=FALSE, fig.width=11, fig.height=8}
par(mar=c(6,4,2,2))
Expand Down Expand Up @@ -501,9 +546,6 @@ metrics <- similarity(demoProfiles$chr19.27739373.27739767$H3K27ac,
metrics$metric$RATIO_INTERSECT
```

The **RATIO_INTERSECT** metric can be useful to detect regions with possibily
similar profiles.


## RATIO_NORMALIZED_INTERSECT Metric

Expand All @@ -514,6 +556,9 @@ of the profile divided by profile lenght). `NA` if minimal area threshold
(`ratioNormalizedIntersectThreshold` parameter) is not respected for
the intersection area.

The **RATIO_NORMALIZED_INTERSECT** metric can be useful to detect regions with
possibily similar profiles even if their have different amplitude.

```{r ratio_normalized_intersect, echo=FALSE, fig.width=11, fig.height=8}
par(mar = c(6,4,2,2))
Expand Down Expand Up @@ -602,9 +647,6 @@ metrics <- similarity(demoProfiles$chr19.27739373.27739767$H3K27ac,
metrics$metric$RATIO_NORMALIZED_INTERSECT
```

The **RATIO_NORMALIZED_INTERSECT** metric can be useful to detect regions with
possibily similar profiles even if their have different amplitude.


## SPEARMAN_CORRELATION Metric

Expand All @@ -613,6 +655,12 @@ usign the two profiles.`NA` if minimal standard deviation
(`spearmanCorrSDThreashold` parameter) is not respected for at least one of the
profiles.

The **SPEARMAN_CORRELATION** assesses how well the relationship between the two
ChIP-Seq profiles can be described using a monotonic function. As the
**RATIO_NORMALIZED_INTERSECT** metric, the **SPEARMAN_CORRELATION**
metric can be useful to detect regions with possibily similar profiles even if
their have different amplitude.

```{r spearman_corr_graphs, echo=FALSE, fig.width=11, fig.height=8}
par(mar=c(6,4,2,2))
Expand Down Expand Up @@ -686,12 +734,6 @@ metrics <- similarity(demoProfiles$chr19.27739373.27739767$H3K27ac,
metrics$metric$SPEARMAN_CORRELATION
```

The **SPEARMAN_CORRELATION** assesses how well the relationship between the two
ChIP-Seq profiles can be described using a monotonic function. As the
**RATIO_NORMALIZED_INTERSECT** metric, the **SPEARMAN_CORRELATION**
metric can be useful to detect regions with possibily similar profiles even if
their have different amplitude.


## Quick Start

Expand Down Expand Up @@ -807,15 +849,15 @@ remain the same:

```{r factoryDemo, collapse=TRUE }
ratio_max_max <- factory$createMetric(metricType="RATIO_MAX_MAX",
profile1=demoProfiles$chr2.70360770.70361098$H3K27ac,
profile2=demoProfiles$chr2.70360770.70361098$H3K4me1)
profile1=demoProfiles$chr2.70360770.70361098$H3K27ac,
profile2=demoProfiles$chr2.70360770.70361098$H3K4me1)
ratio_max_max
ratio_normalized_intersect <- factory$createMetric(
metricType="RATIO_NORMALIZED_INTERSECT",
profile1=demoProfiles$chr2.70360770.70361098$H3K27ac,
profile2=demoProfiles$chr2.70360770.70361098$H3K4me1)
metricType="RATIO_NORMALIZED_INTERSECT",
profile1=demoProfiles$chr2.70360770.70361098$H3K27ac,
profile2=demoProfiles$chr2.70360770.70361098$H3K4me1)
ratio_normalized_intersect
```
Expand Down

0 comments on commit 9a53366

Please sign in to comment.