Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symphony labels when query is a negative control #20

Open
hbandukw opened this issue Dec 7, 2021 · 5 comments
Open

Symphony labels when query is a negative control #20

hbandukw opened this issue Dec 7, 2021 · 5 comments

Comments

@hbandukw
Copy link

hbandukw commented Dec 7, 2021

Hello,

I used Symphony to map two single-cell RNAseq queries to my harmony-integrated reference (made using the modular approach presented on Symphony Github). I used queries that were previously annotated to test whether Symphony was working with my reference atlas.
The first of my queries contained cells that should overlap with most celltypes in the atlas. The mapping, fortunately, did result in most of the cells being identified correctly.
My second query however was a "negative control" i.e. did not contain any celltypes in the atlas. After mapping it to the reference, the resulting labels obv did not make sense.

I was wondering about what Symphony does in a situation like this? Does it "fail to map cells" when applicable or just map it to the most transcriptionally similar cells?

If you need more info on this, please let me know.

@joycekang
Copy link
Collaborator

joycekang commented Dec 8, 2021

Hi! great question. As we discuss a bit in the paper, Symphony assumes the reference has the cell types you're interested in in the query. If there are "novel" cell types in the query, Symphony will map query cells to their most similar cell type. In some cases, this can be useful behavior (we use this in the paper to map cancer cells onto a healthy tissue, for example). To help "flag" potentially poorly mapping cells, we developed two "mapping confidence metrics" (code here). which can help identify cases where the query cells are very different from the reference. These are based on the Mahalanobis distance. We showed some example results the paper (Supplementary figures 9-13), but I would say that developing better "confidence scores" for reference mapping is still an open problem. I would try mapping your "negative control" to your atlas - does it have worse confidence (higher metrics) compared to the other dataset?

@hbandukw
Copy link
Author

Hi Joyce,
Ahhh! Thank a lot for the information. I will take a look at the confidence scores and get back to you..

@hbandukw
Copy link
Author

hbandukw commented Dec 13, 2021

Hi Joyce,

I calculated the Mahalanobis distance (per-cell) for both the real-query and the negative-control based on the code you provided:

  1. Negative-control

Screen Shot 2021-12-12 at 5 04 19 PM

  1. Actual query

Screen Shot 2021-12-12 at 5 06 23 PM

The range of values don't look too different to me, what do you think?

@joycekang
Copy link
Collaborator

joycekang commented Dec 17, 2021

Hi there,

Thanks for trying out the confidence metric. It looks to me like the two distributions are indeed a bit different, since the IQR for the actual data is [3.23, 4.66] whereas the negative control has IQR [4.73, 6.16]. It's a little hard to see visually because of the outlier points. We've also noticed that the confidence score doesn't always have good per-cell discriminatory ability (that is, it will be hard to draw a cutoff that splits the actual and negative control cells), but it is reassuring to me that the negative control overall has a distribution of higher distance scores. In our testing, we found that cell types that are very different from the reference (e.g. mapping stromal cells onto an immune atlas) will have metrics with better discriminatory ability.

@hbandukw
Copy link
Author

Hi thanks for your reply!

So I tried mapping another two datasets from my lab on to the reference (muscle query cells to muscle reference), and I am finding that the cells with higher Mahalanobis distance were predicted correctly (these also had a higher kNN prediction probability) and the cells within the smaller Mahalanobis distance bracket (also with low kNN prediction probability) were predicted "incorrectly". Do you have any idea why that is happening? It almost looks like the scores are flipped or something.

I should mention that the cell-predictions for datasets that I have posted the boxplots for are "correctly" correlating with the Mahalanobis distance metric.

Let me know if you need any other information regarding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants