-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what does weight.by.var do mathematically? #2237
Comments
You can see here, the cell embeddings are multiplied by the variance of the PC: seurat/R/dimensional_reduction.R Line 821 in fc4a4f5
|
@brianpenghe raised a very relevant question here. The behavior of weight.by.var is differs with the library(Seurat)
library(tidyverse)
set.seed(seed = 1)
dummyexpMat <- matrix(
data = stats::rexp(n = 2e4, rate = 1),
ncol = 200, nrow = 100
)
colnames(x = dummyexpMat) <- paste0("cell", seq(ncol(x = dummyexpMat)))
row.names(x = dummyexpMat) <- paste0("gene", seq(nrow(x = dummyexpMat)))
# Create Seurat object for testing
obj <- CreateSeuratObject(counts = dummyexpMat)
# Normalize
obj <- NormalizeData(object = obj, verbose = FALSE)
# Scale
obj <- ScaleData(object = obj, verbose = FALSE)
expand.grid( approx=c(T,F), weight.by.var=c(T,F)) %>% rowwise() %>% do(with(., {
# unweighted
# compute PCA
obj <- RunPCA(
object = obj,
features = rownames(x = obj),
verbose = FALSE,
npcs = 10,
weight.by.var = weight.by.var,
approx = approx
)
#cat("weight.by.var",weight.by.var, "approx", approx, "\n")
stdev_pca <- slot(object = obj[["pca"]], name = "stdev")[1]
stdev_weighted <- sqrt(diag(cov(slot(object = obj[["pca"]], name = "cell.embeddings"))))[1]
data.frame(
weight.by.var,
approx,
stdev_pca,
stdev_embedding_is = stdev_weighted,
stdev_embedding_should_1 = if(weight.by.var) stdev_pca^3 else stdev_pca,
stdev_embedding_should_2 = if(weight.by.var) stdev_pca^2 else stdev_pca
)
}))
This table shows the standard deviation of the 1st principal component for some random data. Default parameter values are in bold. I can't image any use of any scaling other than the default (highlighted by italics), but have two possible interpretations of Luckily, the default parameters ( To overcome this inconsistency without affecting too many users, I propose the following:
Additionally it would simplify things to
. I will try to submit a pull request later today or tomorrow. |
OK, looking at the history, it seems that , This slightly changes the interpretation of the above but I still think it would make sense to deprecate the 'weight.by.var' argument. @andrewwbutler, do you know if there is any valid usecase for this? |
Hi @jan-glx , Thanks for the detailed responses/PRs! The motivation for this In order to have consistent behavior between seurat/R/dimensional_reduction.R Lines 831 to 834 in fc4a4f5
should actually be if (weight.by.var) {
cell.embeddings <- pca.results$x
} else {
cell.embeddings <- pca.results$x / (pca.results$sdev[1:npcs] * sqrt(max(1, ncol(object - 1))))
} We should definitely fix the bug for the next release but I think we should maintain the option. |
yes. #2330 does just that (1) + tests (2) + handling transition with warnings & an option (3) + merging code with the duplicated code of reverse.pca (4). Should I remove any part of this? Or do you just want to do (1)? I was afraid that people using Another alternative (to the option transition #2330 | completely eliminating the feature #2331 ) would be to introduce another argument Thanks for considering! |
I like the idea of standardizing to unit variance as a more intuitive operation and follows the same logic as what we trying to do before. Then we also wouldn't need quite as elaborate a transition. We could simply check if they're trying to pass the |
I was running RunPCA.
The instruction confused me:
Can I ask what does it do mathematically? Any formula?
The text was updated successfully, but these errors were encountered: