Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes for 0.4 - simplified IDs, support for external annotations and SCTransform update #20

Merged
merged 40 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c538cfd
Initial work on Annotations struct
rasmushenningsson Jun 4, 2024
1f4205d
Remove var_id_cols and obs_id_cols - use first annotation column as I…
rasmushenningsson Jun 4, 2024
b888fb8
set_var_id_col! and set_obs_id_col!. Mark non-unique ID when showing …
rasmushenningsson Jun 4, 2024
9d78ae7
table_validateunique now acts properly on a single column
rasmushenningsson Jun 5, 2024
a8bbf25
Let user decide how to handle duplicate IDs
rasmushenningsson Jun 5, 2024
df43d71
load_counts/merge_counts can now use additional var_id_cols during merge
rasmushenningsson Jun 5, 2024
7dde76f
load_counts obs_id_col now defaults to "cell_id" to make it different…
rasmushenningsson Jun 5, 2024
41371e8
Preparation for allowing "external" covariates, i.e. covariates not i…
rasmushenningsson Jun 5, 2024
5dc956b
Annotations improvements. Handle annotations in normalize_matrix and …
rasmushenningsson Jun 7, 2024
288974b
Allow multiple annotations to be passed when projecting
rasmushenningsson Jun 7, 2024
f4dd828
Rename annotations=>external_obs (or external_var) for projection use…
rasmushenningsson Jun 7, 2024
77a40fa
Code consistency: Use external_obs instead of annotations
rasmushenningsson Jun 7, 2024
dde4fd4
Improve unit testing
rasmushenningsson Jun 7, 2024
38012ff
Support DataFrames where Annotations are supported
rasmushenningsson Jun 7, 2024
7bb7c5e
Annotations is now considered experimental
rasmushenningsson Jun 7, 2024
9ad4fe0
Add some Two-Group normalization tests with external obs
rasmushenningsson Jun 7, 2024
a8564f6
external_var/external_obs for filtering functions
rasmushenningsson Jun 7, 2024
4d926f8
Minor fixes
rasmushenningsson Jun 7, 2024
10c8471
filter with external annotations - bug fixes and unit tests
rasmushenningsson Jun 7, 2024
eebfea6
var_counts_fraction! now supports external_var
rasmushenningsson Jun 7, 2024
29ab53c
Support external_var/external_obs for transforms
rasmushenningsson Jun 7, 2024
0c4296c
Fix bug when using var_filter for tf_idf_transform
rasmushenningsson Jun 7, 2024
639be82
Add unit tests for variable subsetting in logtransform
rasmushenningsson Jun 7, 2024
4f8757d
logtransform/tf_idf_transform handling of duplicate var IDs
rasmushenningsson Jun 17, 2024
6376cbe
Update to use latest SCTransform
rasmushenningsson Jun 17, 2024
46bdfb6
sctransform: improved variable handling
rasmushenningsson Jun 18, 2024
8bd782f
feature_mask type improvement
rasmushenningsson Jun 18, 2024
6faee3f
Minor unit test fix due to breaking SCTransform change
rasmushenningsson Jun 18, 2024
8e37207
Fix test mistake
rasmushenningsson Jun 18, 2024
a2c2cf4
SVDModel now only stores U,S and not V
rasmushenningsson Jun 18, 2024
66c867a
Add var_counts_fraction (not in-place)
rasmushenningsson Jun 18, 2024
22e72db
var_counts_fraction help text improvement
rasmushenningsson Jun 18, 2024
9bffbdd
Add propertynames(::Annotations)
rasmushenningsson Jun 18, 2024
facada4
Fix typo in assert text
rasmushenningsson Jun 18, 2024
e09bdf2
Add var_counts_sum
rasmushenningsson Jun 19, 2024
306276e
Refactor external annotations
rasmushenningsson Jun 20, 2024
ac2f4b2
getindex(::Annotations, ::AbstractVector)
rasmushenningsson Jun 20, 2024
ebdfac9
Unit test filtering with multiple external annotations
rasmushenningsson Jun 20, 2024
bac544e
Finish refactoring of external_obs for var_counts_fraction and var_co…
rasmushenningsson Jun 20, 2024
df7a818
Update CHANGELOG and minor code/comment cleanup
rasmushenningsson Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.4.0] - 2024-06-20

### Breaking

* DataMatrix will now always use the first column of var/obs annotations as ID. (Multiple ID columns are no longer supported.)
* `load_counts` - The default obs ID column name is now "cell_id" (was "id" before).
* `load10x` - default to using only first column (id) as unique identifier. Specify e.g. `var_id="var_id"=>["id", "feature_type"]` to merge multiple columns to create the ID.
* `load10x` - default to using first column (barcode) as unique identifier.
* `load10x` - no longer supports `copy_obs_col` kwarg.
* `set_var_id_cols!` is replaced with `set_var_id_col!` (since there is only one ID column).
* `set_obs_id_cols!` is replaced with `set_obs_id_col!` (since there is only one ID column).
* Update to SCTransform 0.2, which handles `logcellcounts` better when there are multiple modalities (e.g. RNA and antibody counts) present in the data.

### Added

* `var_counts_fraction` - Just like `var_counts_fraction!`, but not modifying the object in place.
* `var_counts_sum` and `var_counts_sum!` - For summing over selected variables. Useful for counting e.g. total RNA expression and finding number of expressed features.
* Added support for using external annotations where applicable (filter, transforms, normalization, statistical tests, var_counts_fraction!, var_counts_sum!)
* Added experimental (thus yet unexported) `Annotations` struct, that wraps a `DataFrame` with IDs in the first column, and ensures that ID remain when accessing columns. (So that the resulting object can be leftjoined to `data.obs`/`data.var`.)

### Fixed

* Add compat for weakdeps (UMAP, TSne, PrincipalMomentAnalysis).
* SVDModel now only stores `U` and `S` since `V` is not needed for projection.

## [0.3.9] - 2024-03-04

Expand Down
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "SingleCellProjections"
uuid = "03d38035-ed2f-4a36-82eb-797f1727ab2e"
authors = ["Rasmus Henningsson <[email protected]>"]
version = "0.3.9"
version = "0.4.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Expand Down Expand Up @@ -49,7 +49,7 @@ PrecompileTools = "1"
PrincipalMomentAnalysis = "0.2"
Random = "1"
Requires = "1.2"
SCTransform = "0.1"
SCTransform = "0.2"
SingleCell10x = "0.1, 0.2"
SparseArrays = "1"
StableRNGs = "1"
Expand Down
2 changes: 1 addition & 1 deletion ext/SingleCellProjectionsPrincipalMomentAnalysisExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ See also: [`PrincipalMomentAnalysis.pma`](https://principalmomentanalysis.github
"""
function PrincipalMomentAnalysis.pma(data::DataMatrix, args...; nsv=3, var=:copy, obs=:copy, kwargs...)
F = implicitpma(data.matrix, args...; nsv=nsv, kwargs...)
model = PMAModel(F,select(data.var,data.var_id_cols), var, obs)
model = PMAModel(F,select(data.var,1), var, obs)
update_matrix(data, F, model; model.var, model.obs)
end

Expand Down
2 changes: 1 addition & 1 deletion ext/SingleCellProjectionsUMAPExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The other `args...` and `kwargs...` are forwarded to `UMAP.umap`. See `UMAP` doc
See also: [`UMAP.umap`](https://github.com/dillondaudert/UMAP.jl)
"""
function UMAP.umap(data::DataMatrix, args...; obs=:copy, kwargs...)
model = UMAPModel(UMAP.UMAP_(obs_coordinates(data), args...; kwargs...), select(data.var,data.var_id_cols), obs)
model = UMAPModel(UMAP.UMAP_(obs_coordinates(data), args...; kwargs...), select(data.var,1), obs)
update_matrix(data, model.m.embedding, model; var="UMAP", model.obs)
end

Expand Down
12 changes: 10 additions & 2 deletions src/SingleCellProjections.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ export
NearestNeighborModel,
ObsAnnotationModel,
VarCountsFractionModel,
VarCountsSumModel,
PseudoBulkModel,
project,
set_var_id_cols!,
set_obs_id_cols!,
set_var_id_col!,
set_obs_id_col!,
var_coordinates,
obs_coordinates,
load10x,
Expand All @@ -39,6 +40,9 @@ export
var_to_obs,
var_to_obs_table,
var_counts_fraction!,
var_counts_fraction,
var_counts_sum!,
var_counts_sum,
pseudobulk,
local_outlier_factor!,
local_outlier_factor_projection!,
Expand Down Expand Up @@ -92,6 +96,9 @@ include("sctransformsparse.jl")

include("implicitsvd.jl")

include("annotations.jl")
include("annotation_utils.jl")

include("lowrank.jl")
include("projectionmodels.jl")
include("datamatrix.jl")
Expand All @@ -116,6 +123,7 @@ include("reduce.jl")
include("annotate.jl")
include("statistical_tests.jl")
include("counts_fraction.jl")
include("counts_sum.jl")
include("pseudobulk.jl")

include("local_outlier_factor.jl")
Expand Down
12 changes: 6 additions & 6 deletions src/adjacency_matrices.jl
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,12 @@ end
function knn_adjacency_matrix(data::DataMatrix; kwargs...)
adj = knn_adjacency_matrix(obs_coordinates(data.matrix); kwargs...)
obs = copy(data.obs)
DataMatrix(adj, obs, obs; var_id_cols=data.obs_id_cols, data.obs_id_cols)
DataMatrix(adj, obs, obs)
end

function knn_adjacency_matrix(X::DataMatrix, Y::DataMatrix; kwargs...)
adj = knn_adjacency_matrix(obs_coordinates(X.matrix), obs_coordinates(Y.matrix); kwargs...)
DataMatrix(adj, copy(X.obs), copy(Y.obs); var_id_cols=X.obs_id_cols, Y.obs_id_cols)
DataMatrix(adj, copy(X.obs), copy(Y.obs))
end


Expand All @@ -69,10 +69,10 @@ At the moment all points in `Y` are required to have the same number of neighbor
for computation reasons.
"""
function adjacency_distances(adj::DataMatrix, X::DataMatrix, Y::DataMatrix=X)
table_cols_equal(adj.var, X.obs; cols=X.obs_id_cols) || error("Adjacency matrix and DataMatrix have different obs.")
table_cols_equal(adj.obs, Y.obs; cols=Y.obs_id_cols) || error("Adjacency matrix and DataMatrix have different obs.")
table_cols_equal(adj.var, X.obs; cols=names(X.obs,1)) || error("Adjacency matrix and DataMatrix have different obs.")
table_cols_equal(adj.obs, Y.obs; cols=names(Y.obs,1)) || error("Adjacency matrix and DataMatrix have different obs.")
D = _adjacency_distances(adj.matrix, X, Y)
DataMatrix(D, copy(adj.var), copy(adj.obs); adj.var_id_cols, adj.obs_id_cols)
DataMatrix(D, copy(adj.var), copy(adj.obs))
end


Expand All @@ -98,7 +98,7 @@ function _adjacency_distances(adj, X::DataMatrix, Y::DataMatrix=X)

# Xs = X[:,Is] # Doesn't work since DataMatrix doesn't allow duplicate IDs
# Temporary workaround - TODO: fix proper interface?
Xs = DataMatrix(_subsetmatrix(X.matrix,:,Is), X.var, DataFrame(id=1:length(Is)); var_id_cols=X.var_id_cols)
Xs = DataMatrix(_subsetmatrix(X.matrix,:,Is), X.var, DataFrame(id=1:length(Is)))


# Ys = Y[:,Js] # guaranteed to be equal to Y
Expand Down
6 changes: 3 additions & 3 deletions src/annotate.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ function ObsAnnotationModel(fvar, data::DataMatrix;

# kwargs trick to let defaults be decided here if `nothing` is passed to name_src or names
if names === nothing
name_src = @something name_src _get_name_src(fvar) data.var_id_cols
name_src = @something name_src _get_name_src(fvar) Base.names(data.var,1)
names = _default_out_name(name_src)
end

var_ind = _filter_indices(data.var, fvar)
v = data.var[var_ind,:]
var_match = select(v, data.var_id_cols; copycols=false)
var_match = select(v, 1; copycols=false)
isempty(var_match) && throw(ArgumentError("No variables match filter ($fvar)."))
ObsAnnotationModel(var_match, instantiate_out_names(v, names), var, obs, matrix)
end
Expand Down Expand Up @@ -88,7 +88,7 @@ end
function var_to_obs_table(fvar, data; kwargs...)
model = ObsAnnotationModel(fvar, data; kwargs...)
new_obs = _new_annot(data, model)
hcat(select(data.obs, data.obs_id_cols), new_obs)
hcat(select(data.obs, 1), new_obs)
end


Expand Down
17 changes: 17 additions & 0 deletions src/annotation_utils.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
find_annotation(::String, ::Nothing) = nothing
function find_annotation(name::String, df::DataFrame)
hasproperty(df, name) || return nothing
select(df, [only(names(df,1)), name]; copycols=false)
end
function find_annotation(name::String, a::Annotations)
x = get(a, name, nothing)
x !== nothing ? get_table(x) : nothing
end

function find_annotation(name::String, annot::AbstractVector)
for a in annot
x = find_annotation(name, a)
x !== nothing && return x
end
nothing
end
60 changes: 60 additions & 0 deletions src/annotations.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# NB: Annotations is considered experimental API and thus not exported.
# It may get breaking changes in minor/patch releases.
struct Annotations
df::DataFrame # implementation detail, might be changed later. The first column is the ID column, and the name of that column is the name of the axis.
end

get_table(a::Annotations) = getfield(a,:df)


Base.haskey(a::Annotations, name::String) = hasproperty(get_table(a), name)


function Base.get(f::Union{Type,Function}, a::Annotations, column::String)
df = get_table(a)
hasproperty(df, column) || return f()
id_column = names(df, 1)
cols = only(id_column) == column ? id_column : vcat(id_column,column)
Annotations(select(df, cols; copycols=false))
end
Base.get(a::Annotations, column::String, default) = get(()->default, a, column)
Base.get(f::Union{Type,Function}, a::Annotations, column::Symbol) = get(f, a, String(column))
Base.get(a::Annotations, column::Symbol, default) = get(a,String(column), default)


Base.getindex(a::Annotations, column::Union{Symbol,String}) = get(()->throw(KeyError(column)), a, column)

function Base.getindex(a::Annotations, columns::AbstractVector{String})
df = get_table(a)
for column in columns
hasproperty(df, column) || throw(KeyError(column))
end

id_column = names(df,1)
id_ind = findfirst(isequal(only(id_column)), columns)
if id_ind !== nothing # ID column present? Move it first and keep the relative order between the others.
cols = append!(id_column, @view(columns[1:id_ind-1]))
cols = append!(id_column, @view(columns[id_ind+1:end]))
else # ID column not present? Add it to the beginning.
cols = append!(id_column, columns)
end
Annotations(select(df,cols; copycols=false))
end
Base.getindex(a::Annotations, columns::AbstractVector{<:Union{Symbol,String}}) = a[String.(columns)]


Base.propertynames(a::Annotations, private::Bool) = propertynames(get_table(a), private)
Base.getproperty(a::Annotations, column::Symbol) = a[column]
Base.getproperty(a::Annotations, column::String) = a[column]

function annotation_name(a::Annotations)
df = get_table(a)
@assert size(df,2) == 2 "Expected annotations object to have an ID column and a single data column, got columns: $(names(df))"
only(names(df,2))
end

# function annotation_values(a::Annotations)
# df = get_table(a)
# @assert size(df,2) == 2 "Expected annotations object to have an ID column and a single data column, got columns: $(names(df))"
# df[!,2]
# end
55 changes: 49 additions & 6 deletions src/counts_fraction.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ function VarCountsFractionModel(counts::DataMatrix{<:AbstractMatrix{<:Integer}},
sub_ind = _filter_indices(var_annot, sub_filter)
tot_ind = _filter_indices(var_annot, tot_filter)

var_id = select(var_annot, counts.var_id_cols)
var_id = select(var_annot, 1)
var_match_sub = var_id[sub_ind, :]
var_match_tot = var_id[tot_ind, :]

Expand All @@ -39,7 +39,7 @@ update_model(m::VarCountsFractionModel; col=m.col, var=m.var, obs=m.obs, matrix=

# TODO: make general table utility function?
function _matching_var_mask(v, sub)
bad_ind = findfirst(isnothing, table_indexin(sub,v; cols= names(sub)))
bad_ind = findfirst(isnothing, table_indexin(sub,v; cols=names(sub)))
if bad_ind !== nothing
error("Row with contents (", join(sub[bad_ind,:],","), ") not found in var.")
end
Expand All @@ -64,12 +64,16 @@ end


"""
var_counts_fraction!(counts::DataMatrix, sub_filter, tot_filter, col; check=true)
var_counts_fraction!(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:keep, obs=:keep)

For each observation, compute the fraction of counts that match a specific variable pattern.
* `sub_filter` decides which variables are counted.
* `tot_filter` decides which variables to include in the total.
* If `check=true`, an error will be thrown if no variables match the patterns.

kwargs:
* `var` - Use this to set `var` in the `ProjectionModel`.
* `obs` - Use this to set `obs` in the `ProjectionModel`. Note that `counts.obs` is changed in place, regardless of the value of `obs`.
If `check=true`, an error will be thrown if no variables match the patterns.

For more information on filtering syntax, see examples below and the documentation on [`DataFrames.filter`](https://dataframes.juliadata.org/stable/lib/functions/#Base.filter).

Expand All @@ -78,13 +82,15 @@ Examples

Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
```
var_counts_fraction!(counts, "name"=>contains(r"^MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")
var_counts_fraction!(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")
```

Compute the fraction of reads in MT- genes, when there is no `feature_type` annotation (i.e. all variables are genes).
```
var_counts_fraction!(counts, "name"=>contains(r"^MT-"), Returns(true), "fraction_mt")
var_counts_fraction!(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")
```

See also: [`var_counts_fraction`](@ref)
"""
function var_counts_fraction!(counts::DataMatrix{<:AbstractMatrix{<:Integer}}, args...; kwargs...)
model = VarCountsFractionModel(counts, args...; var=:keep, obs=:keep, matrix=:keep, kwargs...)
Expand All @@ -93,6 +99,43 @@ function var_counts_fraction!(counts::DataMatrix{<:AbstractMatrix{<:Integer}}, a
counts
end


"""
var_counts_fraction(counts::DataMatrix, sub_filter, tot_filter, col; check=true, var=:copy, obs=:copy)

For each observation, compute the fraction of counts that match a specific variable pattern.
* `sub_filter` decides which variables are counted.
* `tot_filter` decides which variables to include in the total.

kwargs:
* `var` - Can be `:copy` (make a copy of source `var`) or `:keep` (share the source `var` object).
* `obs` - Can be `:copy` (make a copy of source `obs`) or `:keep` (share the source `obs` object).
If `check=true`, an error will be thrown if no variables match the patterns.

For more information on filtering syntax, see examples below and the documentation on [`DataFrames.filter`](https://dataframes.juliadata.org/stable/lib/functions/#Base.filter).

Examples
=========

Compute the fraction of reads in MT- genes, considering only "Gene Expression" features (and not e.g. "Antibody Capture").
```
var_counts_fraction(counts, "name"=>startswith("MT-"), "feature_type"=>isequal("Gene Expression"), "fraction_mt")
```

Compute the fraction of reads in MT- genes, when there is no `feature_type` annotation (i.e. all variables are genes).
```
var_counts_fraction(counts, "name"=>startswith("MT-"), Returns(true), "fraction_mt")
```

See also: [`var_counts_fraction!`](@ref)
"""
function var_counts_fraction(counts::DataMatrix{<:AbstractMatrix{<:Integer}}, args...; kwargs...)
model = VarCountsFractionModel(counts, args...; var=:copy, obs=:copy, matrix=:keep, kwargs...)
project(counts, model)
end



function project_impl(counts::DataMatrix{<:AbstractMatrix{<:Integer}}, model::VarCountsFractionModel; verbose=true)
frac = _var_counts_fraction(counts, model)

Expand Down
Loading
Loading