-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels in Memory T Cell Atlas missing #14
Comments
Hi @mihem, Thanks for the question: The reference object currently published on Zenodo was built before the publication of the Nathan et al. dataset. Therefore, it does not contain cell type labels or protein expression. These can now be obtained from GEO (GSE158769): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158769 You can update the reference metadata directly, such as ref$meta_data = new_meta_data. Let me know if that helps! |
Hi Joyce, Sorry I think you misunderstood me. What you said is well explained in your vignette. I downloaded the ref object from zenodo and the meta data from geo and added them exactly as you explained. However there are several missing labels (260). Therefore the So my question is not: "where to get the meta data"... but "could you please provide the labels without missing values or help me subsetting the ref object so that it removes the few cells with missing values". Thank you |
Ah, apologies - I read it too fast. I think removing the NA cells is a good way to go. To do this, here is some code (haven't tried running it myself but this should work):
Can you give that a try? If that doesn't work I can take another look tonight. Thanks! |
Hi Joyce, that was an easy solution and worked well, thanks! the |
hi @mihem, Glad that worked. For the k-NN function, I agree that it can be suboptimally slow for large reference. What I will say is that Symphony embeddings are agnostic to the downstream inference function. That is, you can train any sort of model (multinomial logistic regression using the As example, multinomial LR would be something like this:
@ilyakorsunsky has implemented a slightly more sophisticated version of this for the Fibroblast Atlas from our lab, where each soft cluster learns it's own model, then for a given query cell it incorporates predictions from all soft clusters. |
Thanks for the insights. |
Fair point! I have it on the todo list - do you happen to have a suggestion for a faster C++ implemented library? |
Cool. Sorry, no real expertise on that. Some googing led me to: |
Thanks for this great package that is really helpful in single cell analysis.
I struggle with the memory T cell atlas reference, though. Following the tutorial and downloading the annotations from GEO results in 260 missing labels.
This wouldn't be a problem in this huge dataset I think, but after mapQuery, knnPredict.Seurat (I use a Seurat query object) fails with
Error in class::knn(t(ref_obj$Z_corr), Embeddings(query_obj, "harmony"), : no missing values are allowed
Traceback:
3: stop("no missing values are allowed") 2: class::knn(t(ref_obj$Z_corr), Embeddings(query_obj, "harmony"), ref_obj$meta_data[[label_transfer]], k = k) at utils_seurat_symphony.R#452 1: knnPredict.Seurat(tcells, ref_tbru, "cluster_name")
@joycekang Can you maybe provide all labels (same lab, right? ;)) or provide a workaround (filter out cells with missing label?).
Thank you,
Mischko
The text was updated successfully, but these errors were encountered: