Added the functionality of scale.factor in NormalizeData being set to "median" of counts #9389
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The NormalizeData generic has several different implementations depending on the class of the first object passed as a parameter to the method. These are NormalizeData.Assay, NormalizeData.default, NormalizeData.Seurat, NormalizeData.StdAssay, and NormalizeData.V3Matrix*. Each of these methods has three possible in-built normalization methods: "LogNormalize", "CLR" and "RC". Excluding "CLR", the other two methods make use of a scale.factor parameter to multiply the normalized values during the normalization process. This value is defaulted to 1e4 for LogNormalize and 1 for RelativeCounts (RC).
Updates
I have added in the capacity for the implementations of LogNormalize, RelativeCounts and .SparseNormalize to compute the median of the counts across all columns (cells) (or rows (genes) if margin = 1L in the case of LogNormalize.default) and use this as the scale.factor, if the value passed to the scale.factor parameter is "median".
I have also tested the modifications to these functions by writing unit tests in test_preprocessing.R that make sure that the median is being computed correctly if the value passed to the scale.factor parameter is "median".