Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason behind not using Muon/AnnData #14

Open
aadimator opened this issue Dec 15, 2023 · 2 comments
Open

Reason behind not using Muon/AnnData #14

aadimator opened this issue Dec 15, 2023 · 2 comments

Comments

@aadimator
Copy link

I really like the projects, but I was curious a little. Was there any specific reason for not using Muon.jl to store the data? What would have been the drawbacks if AnnData type objects were used here?

Thanks

@rasmushenningsson
Copy link
Collaborator

That's a great question!

I did get some inspiration from Muon - in particular the use of var and obs to denote rows and columns of my DataMatrix.

My main reason for not using AnnData is that I wanted flexibility as I developed SingleCellProjections.jl.
(If AnnData turns out to be a good fit, I would not mind putting out a breaking release to achieve better standardization across the ecosystem.)

I don't have very detailed knowledge of Muon/AnnData, so I might have misunderstood some things.
However, there are several things that don't seem to fit:

  • AnnData is using AbstractMatrix for the data. The MatrixExpressions that SingleCellProjections.jl builds on are not AbstractMatrices.
  • My understanding is that multiple "assays"/"analyses" (layers?) are typically stored in the same AnnData object (e.g. the original counts, the data after normalization, dimension reduction etc.). I prefer to keep them separate, since it is much more clear for the user which information is used in some computation. It also makes it easier to use different sets of variables/observations in different steps. And finally, it makes it easier to reclaim memory when some parts are no longer needed.
  • Loading an AnnData analysis produced by another package/language will not work great with SingleCellProjections.jl. The strength in SingleCellProjections.jl is achieved by starting from counts and using MatrixExpression. The data is at no point converted to a large dense matrix - and it is not possible to "recover" if that is done.
  • I didn't fully understand AnnData by skimming through the code and didn't find documentation that explains e.g. obsm, obsp, varm, varp, layers. I don't know what AlignedMapping means. :)

In the future I would like to have an AbstractAssay (or just Assay or DataMatrix or AbstractAnnData) in a separate package, that both Muon.jl and SingleCellProjections.jl (and many other packages) could subtype. Similar to how Tables.jl provides a common interface for different packages working with tabular data - with DataFrames.jl being the most famous one.

@aadimator
Copy link
Author

Thank you for such a detailed and informative reply. Looking forward to using (and hopefully contributing) to this in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants