Skip to content

Commit

Permalink
Merge branch 'master' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
CiaranOMara committed Jun 13, 2020
2 parents c10623f + 99632a3 commit d885bc9
Show file tree
Hide file tree
Showing 8 changed files with 111 additions and 25 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

## [2.0.3] - 2020-06-13

### Added
- Julia LTS Support
- Benchmarks

### Changed
- Documentation.
- Updated CI for General Repository.

## [2.0.2] - 2020-05-21

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "GenomicFeatures"
uuid = "899a7d2d-5c61-547b-bef9-6698a8d05446"
authors = ["Kenta Sato <[email protected]>", "Ben J. Ward <[email protected]>", "Ciarán O’Mara <[email protected]>"]
version = "2.0.2"
version = "2.0.3"

[deps]
BioGenerics = "47718e42-2ac5-11e9-14af-e5595289c2ea"
Expand Down
2 changes: 2 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ Pkg.instantiate()

using Documenter, GenomicFeatures

DocMeta.setdocmeta!(GenomicFeatures, :DocTestSetup, :(using GenomicFeatures); recursive=true)

makedocs(
format = Documenter.HTML(
edit_link = :commit
Expand Down
51 changes: 32 additions & 19 deletions docs/src/man/intervals.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Intervals in `GenomicFeatures` are consistent with ranges in Julia: *1-based and
When data is read from formats with different representations (i.e. 0-based and/or end-exclusive) they are always converted automatically.
Similarly when writing data, you should not have to reason about off-by-one errors due to format differences while using functionality provided in `GenomicFeatures`.

The `Interval` type is defined as
The [`Interval`](@ref Interval) type is defined as
```julia
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
seqname::String
Expand All @@ -19,9 +19,9 @@ struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
end
```

The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the `Interval` object.
The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object.
The `seqname` field holds the sequence name associated with the interval.
The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with `leftposition` and `rightposition` functions, respectively.
The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with [`leftposition`](@ref leftposition) and [`rightposition`](@ref rightposition) functions, respectively.

The `strand` field can take four kinds of values listed in the next table:

Expand All @@ -32,32 +32,31 @@ The `strand` field can take four kinds of values listed in the next table:
| `'-'` | `STRAND_NEG` | negative strand |
| `'.'` | `STRAND_BOTH` | non-strand-specific feature |

`Interval` is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats.
[`Interval`](@ref Interval) is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats.

The default strand and metadata value are `STRAND_BOTH` and `nothing`:
```jlcon
```jldoctest; setup = :(using GenomicFeatures)
julia> Interval("chr1", 10000, 20000)
GenomicFeatures.Interval{Nothing}:
Interval{Nothing}:
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
strand: .
metadata: nothing
julia> Interval("chr1", 10000, 20000, '+')
GenomicFeatures.Interval{Nothing}:
Interval{Nothing}:
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
strand: +
metadata: nothing
```

The following example shows all accessor functions for the five fields:
```jlcon
```jldoctest; setup = :(using GenomicFeatures)
julia> i = Interval("chr1", 10000, 20000, '+', "some annotation")
GenomicFeatures.Interval{String}:
Interval{String}:
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
Expand All @@ -78,18 +77,18 @@ STRAND_POS
julia> metadata(i)
"some annotation"
```


## Collections of Intervals

Collections of intervals are represented using the `IntervalCollection` type, which is a general purpose indexed container for intervals.
Collections of intervals are represented using the [`IntervalCollection`](@ref IntervalCollection) type, which is a general purpose indexed container for intervals.
It supports fast intersection operations as well as insertion, deletion, and sorted iteration.

Interval collections can be initialized by inserting elements one by one using `push!`.
Empty interval collections can be initialized, and intervals elements can be added to the collection one-by-one using `push!`.

```julia
```@example
using GenomicFeatures # hide
# The type parameter (Nothing here) indicates the interval metadata type.
col = IntervalCollection{Nothing}()
Expand All @@ -98,18 +97,32 @@ for i in 1:100:10000
end
```

Incrementally building an interval collection like this works, but `IntervalCollection` also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from an array of intervals.
Incrementally building an interval collection like this works, but [`IntervalCollection`](@ref IntervalCollection) also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from a sorted vector of intervals.

```julia
```jldoctest; setup = :(using GenomicFeatures), output = false
col = IntervalCollection([Interval("chr1", i, i + 99) for i in 1:100:10000])
# output
IntervalCollection{Nothing} with 100 intervals:
chr1:1-100 . nothing
chr1:101-200 . nothing
chr1:201-300 . nothing
chr1:301-400 . nothing
chr1:401-500 . nothing
chr1:501-600 . nothing
chr1:601-700 . nothing
chr1:701-800 . nothing
```

Building `IntervalCollections` in one shot like this should be preferred when it's convenient or speed is an issue.
Building [`IntervalCollection`](@ref IntervalCollection)s in one shot like this should be preferred when it's convenient or speed is an issue.


## Overlap Query

There are number of `eachoverlap` functions in the `GenomicFeatures` module.
There are number of [`eachoverlap`](@ref eachoverlap) functions in the `GenomicFeatures` module.
They follow two patterns:
- interval versus collection queries which return an iterator over intervals in the collection that overlap the query, and
- collection versus collection queries which iterate over all pairs of overlapping intervals.
Expand All @@ -118,7 +131,7 @@ They follow two patterns:
eachoverlap
```

The order of interval pairs is the same as the following nested loop but `eachoverlap` is often much faster:
The order of interval pairs is the same as the following nested loop but [`eachoverlap`](@ref eachoverlap) is often much faster:
```julia
for a in intervals_a, b in intervals_b
if isoverlapping(a, b)
Expand Down
17 changes: 17 additions & 0 deletions src/coverage.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,23 @@ For example, given intervals like:
This function would return a new set of disjoint intervals with annotated coverage like:
[1][-2-][-1-][--2--][--1--]
# Example
```jldoctest
julia> intervals = [
Interval("chr1", 1, 8),
Interval("chr1", 4, 20),
Interval("chr1", 14, 27)];
julia> coverage(intervals)
IntervalCollection{UInt32} with 5 intervals:
chr1:1-3 . 1
chr1:4-8 . 2
chr1:9-13 . 1
chr1:14-20 . 2
chr1:21-27 . 1
```
"""
function coverage(stream, seqname_isless::Function=isless)
cov = IntervalCollection{UInt32}()
Expand Down
13 changes: 12 additions & 1 deletion src/interval.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,18 @@
# License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md

# Note, just to be clear: this shadows IntervalTrees.Interval
"A genomic interval specifies interval with some associated metadata"
"""
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object.
# Fields
- `seqname::String`: the sequence name associated with the interval.
- `first::Int64`: the leftmost position.
- `last::Int64`: the rightmost position.
- `strand::Strand`: the [`strand`](@ref Strand).
- `metadata::T`
"""
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
seqname::String
first::Int64
Expand Down
19 changes: 15 additions & 4 deletions src/intervalcollection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ const ICTreeIntersection{T} = IntervalTrees.Intersection{Int64
const ICTreeIntersectionIterator{F,S,T} = IntervalTrees.IntersectionIterator{F,Int64,Interval{S},64,Interval{T},64}
const ICTreeIntervalIntersectionIterator{F,T} = IntervalTrees.IntervalIntersectionIterator{F, Int64,Interval{T},64}

"An IntervalCollection is an efficiently stored and indexed set of annotated genomic intervals."
mutable struct IntervalCollection{T}
# Sequence name mapped to IntervalTree, which in turn maps intervals to a list of metadata.
trees::Dict{String,ICTree{T}}
Expand All @@ -51,11 +52,12 @@ mutable struct IntervalCollection{T}
ordered_trees::Vector{ICTree{T}}
ordered_trees_outdated::Bool

"Empty initaialzation."
function IntervalCollection{T}() where T
return new{T}(Dict{String,ICTree{T}}(), 0, ICTree{T}[], false)
end

# Bulk insertion.
"Bulk insertion."
function IntervalCollection{T}(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
if sort
sort!(intervals)
Expand All @@ -80,17 +82,26 @@ mutable struct IntervalCollection{T}
end
end

# Shorthand constructor.
"""
IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
Shorthand constructor.
"""
function IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
return IntervalCollection{T}(intervals, sort)
end

# Constructor that offers conversion through collection.
"""
IntervalCollection{T}(data, sort::Bool=false) where T
Constructor that offers conversion through collection.
"""
function IntervalCollection{T}(data, sort::Bool=false) where T
return IntervalCollection(collect(Interval{T}, data), sort)
end

# Constructor that guesses metadatatype, and offers conversion through collection.
"""
IntervalCollection(data, sort::Bool=false)
Constructor that guesses metadatatype, and offers conversion through collection.
"""
function IntervalCollection(data, sort::Bool=false)
return IntervalCollection(collect(Interval{metadatatype(data)}, data), sort)
end
Expand Down
22 changes: 22 additions & 0 deletions src/strand.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,27 @@
# This file is a part of BioJulia.
# License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md

"""
# Outer constructors
* [`Strand(strand::Char)`](@ref)
* [`Strand(strand::UInt8)`](@ref)
[`Strand`](@ref) can take four kinds of values listed in the next table:
| Symbol | Constant | Meaning |
| :----- | :-------------------- | :-------------------------------- |
| `'?'` | [`STRAND_NA`](@ref) | strand is unknown or inapplicable |
| `'+'` | [`STRAND_POS`](@ref) | positive strand |
| `'-'` | [`STRAND_NEG`](@ref) | negative strand |
| `'.'` | [`STRAND_BOTH`](@ref) | non-strand-specific feature |
"""
primitive type Strand 8 end

Base.convert(::Type{Strand}, strand::UInt8) = reinterpret(Strand, strand)

"""
Strand(strand::UInt8)
"""
Strand(strand::UInt8) = convert(Strand, strand)
Base.convert(::Type{UInt8}, strand::Strand) = reinterpret(UInt8, strand)

Expand Down Expand Up @@ -45,6 +63,10 @@ function Base.convert(::Type{Strand}, strand::Char)

error("'$(strand)' is not a valid strand")
end

"""
Strand(strand::Char)
"""
Strand(strand::Char) = convert(Strand, strand)

function Base.convert(::Type{Char}, strand::Strand)
Expand Down

0 comments on commit d885bc9

Please sign in to comment.