diff --git a/CHANGELOG.md b/CHANGELOG.md index 0179c5fa..73d94a4b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,16 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. ## [Unreleased] +## [2.0.3] - 2020-06-13 + +### Added +- Julia LTS Support +- Benchmarks + +### Changed +- Documentation. +- Updated CI for General Repository. + ## [2.0.2] - 2020-05-21 ### Fixed diff --git a/Project.toml b/Project.toml index 089df463..a6003430 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "GenomicFeatures" uuid = "899a7d2d-5c61-547b-bef9-6698a8d05446" authors = ["Kenta Sato ", "Ben J. Ward ", "Ciarán O’Mara "] -version = "2.0.2" +version = "2.0.3" [deps] BioGenerics = "47718e42-2ac5-11e9-14af-e5595289c2ea" diff --git a/docs/make.jl b/docs/make.jl index 3fc235d4..d4576921 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -5,6 +5,8 @@ Pkg.instantiate() using Documenter, GenomicFeatures +DocMeta.setdocmeta!(GenomicFeatures, :DocTestSetup, :(using GenomicFeatures); recursive=true) + makedocs( format = Documenter.HTML( edit_link = :commit diff --git a/docs/src/man/intervals.md b/docs/src/man/intervals.md index 3cb7888c..de03a8c7 100644 --- a/docs/src/man/intervals.md +++ b/docs/src/man/intervals.md @@ -8,7 +8,7 @@ Intervals in `GenomicFeatures` are consistent with ranges in Julia: *1-based and When data is read from formats with different representations (i.e. 0-based and/or end-exclusive) they are always converted automatically. Similarly when writing data, you should not have to reason about off-by-one errors due to format differences while using functionality provided in `GenomicFeatures`. -The `Interval` type is defined as +The [`Interval`](@ref Interval) type is defined as ```julia struct Interval{T} <: IntervalTrees.AbstractInterval{Int64} seqname::String @@ -19,9 +19,9 @@ struct Interval{T} <: IntervalTrees.AbstractInterval{Int64} end ``` -The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the `Interval` object. +The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object. The `seqname` field holds the sequence name associated with the interval. -The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with `leftposition` and `rightposition` functions, respectively. +The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with [`leftposition`](@ref leftposition) and [`rightposition`](@ref rightposition) functions, respectively. The `strand` field can take four kinds of values listed in the next table: @@ -32,12 +32,12 @@ The `strand` field can take four kinds of values listed in the next table: | `'-'` | `STRAND_NEG` | negative strand | | `'.'` | `STRAND_BOTH` | non-strand-specific feature | -`Interval` is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats. +[`Interval`](@ref Interval) is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats. The default strand and metadata value are `STRAND_BOTH` and `nothing`: -```jlcon +```jldoctest; setup = :(using GenomicFeatures) julia> Interval("chr1", 10000, 20000) -GenomicFeatures.Interval{Nothing}: +Interval{Nothing}: sequence name: chr1 leftmost position: 10000 rightmost position: 20000 @@ -45,19 +45,18 @@ GenomicFeatures.Interval{Nothing}: metadata: nothing julia> Interval("chr1", 10000, 20000, '+') -GenomicFeatures.Interval{Nothing}: +Interval{Nothing}: sequence name: chr1 leftmost position: 10000 rightmost position: 20000 strand: + metadata: nothing - ``` The following example shows all accessor functions for the five fields: -```jlcon +```jldoctest; setup = :(using GenomicFeatures) julia> i = Interval("chr1", 10000, 20000, '+', "some annotation") -GenomicFeatures.Interval{String}: +Interval{String}: sequence name: chr1 leftmost position: 10000 rightmost position: 20000 @@ -78,18 +77,18 @@ STRAND_POS julia> metadata(i) "some annotation" - ``` ## Collections of Intervals -Collections of intervals are represented using the `IntervalCollection` type, which is a general purpose indexed container for intervals. +Collections of intervals are represented using the [`IntervalCollection`](@ref IntervalCollection) type, which is a general purpose indexed container for intervals. It supports fast intersection operations as well as insertion, deletion, and sorted iteration. -Interval collections can be initialized by inserting elements one by one using `push!`. +Empty interval collections can be initialized, and intervals elements can be added to the collection one-by-one using `push!`. -```julia +```@example +using GenomicFeatures # hide # The type parameter (Nothing here) indicates the interval metadata type. col = IntervalCollection{Nothing}() @@ -98,18 +97,32 @@ for i in 1:100:10000 end ``` -Incrementally building an interval collection like this works, but `IntervalCollection` also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from an array of intervals. +Incrementally building an interval collection like this works, but [`IntervalCollection`](@ref IntervalCollection) also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from a sorted vector of intervals. -```julia +```jldoctest; setup = :(using GenomicFeatures), output = false col = IntervalCollection([Interval("chr1", i, i + 99) for i in 1:100:10000]) + +# output + +IntervalCollection{Nothing} with 100 intervals: + chr1:1-100 . nothing + chr1:101-200 . nothing + chr1:201-300 . nothing + chr1:301-400 . nothing + chr1:401-500 . nothing + chr1:501-600 . nothing + chr1:601-700 . nothing + chr1:701-800 . nothing + ⋮ + ``` -Building `IntervalCollections` in one shot like this should be preferred when it's convenient or speed is an issue. +Building [`IntervalCollection`](@ref IntervalCollection)s in one shot like this should be preferred when it's convenient or speed is an issue. ## Overlap Query -There are number of `eachoverlap` functions in the `GenomicFeatures` module. +There are number of [`eachoverlap`](@ref eachoverlap) functions in the `GenomicFeatures` module. They follow two patterns: - interval versus collection queries which return an iterator over intervals in the collection that overlap the query, and - collection versus collection queries which iterate over all pairs of overlapping intervals. @@ -118,7 +131,7 @@ They follow two patterns: eachoverlap ``` -The order of interval pairs is the same as the following nested loop but `eachoverlap` is often much faster: +The order of interval pairs is the same as the following nested loop but [`eachoverlap`](@ref eachoverlap) is often much faster: ```julia for a in intervals_a, b in intervals_b if isoverlapping(a, b) diff --git a/src/coverage.jl b/src/coverage.jl index 820a2fc0..814138cc 100644 --- a/src/coverage.jl +++ b/src/coverage.jl @@ -19,6 +19,23 @@ For example, given intervals like: This function would return a new set of disjoint intervals with annotated coverage like: [1][-2-][-1-][--2--][--1--] + +# Example + +```jldoctest +julia> intervals = [ + Interval("chr1", 1, 8), + Interval("chr1", 4, 20), + Interval("chr1", 14, 27)]; + +julia> coverage(intervals) +IntervalCollection{UInt32} with 5 intervals: + chr1:1-3 . 1 + chr1:4-8 . 2 + chr1:9-13 . 1 + chr1:14-20 . 2 + chr1:21-27 . 1 +``` """ function coverage(stream, seqname_isless::Function=isless) cov = IntervalCollection{UInt32}() diff --git a/src/interval.jl b/src/interval.jl index 41625773..f7c120c5 100644 --- a/src/interval.jl +++ b/src/interval.jl @@ -7,7 +7,18 @@ # License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md # Note, just to be clear: this shadows IntervalTrees.Interval -"A genomic interval specifies interval with some associated metadata" +""" + struct Interval{T} <: IntervalTrees.AbstractInterval{Int64} + +The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object. + +# Fields +- `seqname::String`: the sequence name associated with the interval. +- `first::Int64`: the leftmost position. +- `last::Int64`: the rightmost position. +- `strand::Strand`: the [`strand`](@ref Strand). +- `metadata::T` +""" struct Interval{T} <: IntervalTrees.AbstractInterval{Int64} seqname::String first::Int64 diff --git a/src/intervalcollection.jl b/src/intervalcollection.jl index 5ff7e55b..b862db5c 100644 --- a/src/intervalcollection.jl +++ b/src/intervalcollection.jl @@ -39,6 +39,7 @@ const ICTreeIntersection{T} = IntervalTrees.Intersection{Int64 const ICTreeIntersectionIterator{F,S,T} = IntervalTrees.IntersectionIterator{F,Int64,Interval{S},64,Interval{T},64} const ICTreeIntervalIntersectionIterator{F,T} = IntervalTrees.IntervalIntersectionIterator{F, Int64,Interval{T},64} +"An IntervalCollection is an efficiently stored and indexed set of annotated genomic intervals." mutable struct IntervalCollection{T} # Sequence name mapped to IntervalTree, which in turn maps intervals to a list of metadata. trees::Dict{String,ICTree{T}} @@ -51,11 +52,12 @@ mutable struct IntervalCollection{T} ordered_trees::Vector{ICTree{T}} ordered_trees_outdated::Bool + "Empty initaialzation." function IntervalCollection{T}() where T return new{T}(Dict{String,ICTree{T}}(), 0, ICTree{T}[], false) end - # Bulk insertion. + "Bulk insertion." function IntervalCollection{T}(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T if sort sort!(intervals) @@ -80,17 +82,26 @@ mutable struct IntervalCollection{T} end end -# Shorthand constructor. +""" + IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T +Shorthand constructor. +""" function IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T return IntervalCollection{T}(intervals, sort) end -# Constructor that offers conversion through collection. +""" + IntervalCollection{T}(data, sort::Bool=false) where T +Constructor that offers conversion through collection. +""" function IntervalCollection{T}(data, sort::Bool=false) where T return IntervalCollection(collect(Interval{T}, data), sort) end -# Constructor that guesses metadatatype, and offers conversion through collection. +""" + IntervalCollection(data, sort::Bool=false) +Constructor that guesses metadatatype, and offers conversion through collection. +""" function IntervalCollection(data, sort::Bool=false) return IntervalCollection(collect(Interval{metadatatype(data)}, data), sort) end diff --git a/src/strand.jl b/src/strand.jl index a3b63523..65648e23 100644 --- a/src/strand.jl +++ b/src/strand.jl @@ -6,9 +6,27 @@ # This file is a part of BioJulia. # License is MIT: https://github.com/BioJulia/Bio.jl/blob/master/LICENSE.md +""" +# Outer constructors +* [`Strand(strand::Char)`](@ref) +* [`Strand(strand::UInt8)`](@ref) + +[`Strand`](@ref) can take four kinds of values listed in the next table: + +| Symbol | Constant | Meaning | +| :----- | :-------------------- | :-------------------------------- | +| `'?'` | [`STRAND_NA`](@ref) | strand is unknown or inapplicable | +| `'+'` | [`STRAND_POS`](@ref) | positive strand | +| `'-'` | [`STRAND_NEG`](@ref) | negative strand | +| `'.'` | [`STRAND_BOTH`](@ref) | non-strand-specific feature | +""" primitive type Strand 8 end Base.convert(::Type{Strand}, strand::UInt8) = reinterpret(Strand, strand) + +""" + Strand(strand::UInt8) +""" Strand(strand::UInt8) = convert(Strand, strand) Base.convert(::Type{UInt8}, strand::Strand) = reinterpret(UInt8, strand) @@ -45,6 +63,10 @@ function Base.convert(::Type{Strand}, strand::Char) error("'$(strand)' is not a valid strand") end + +""" + Strand(strand::Char) +""" Strand(strand::Char) = convert(Strand, strand) function Base.convert(::Type{Char}, strand::Strand)