Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when fitting the compass model #81

Open
quinnpeters opened this issue Jan 4, 2024 · 14 comments
Open

Error when fitting the compass model #81

quinnpeters opened this issue Jan 4, 2024 · 14 comments

Comments

@quinnpeters
Copy link

When I try to fit the COMPASS Model with a large compass container, I get the following error which I cannot figure out how to solve:

fit <- COMPASS(CC_comb,
treatment = Stim == "SPIKE",
control = Stim == "DMSO",
iterations = 40000)

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7

As far as I can tell, there is no issue with the metadata or counts. The data structure of the compass container, however, must be causing some issue?

@emjbishop
Copy link

Myself and three other people are all getting the same error now when running the COMPASS function, using different datasets (including test data that used to work) and systems (linux, macOS, and windows). Is there a dependency that was updated this year causing this?

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 7

@gfinak
Copy link
Member

gfinak commented Apr 23, 2024

Can you share data / code so that I might reproduce the error?

@emjbishop
Copy link

emjbishop commented Apr 23, 2024

@gfinak yes, we have a tutorial repo that uses COMPASS. You just need:

Here is a more complete error message with context:

> CC <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.
> fit <- COMPASS(CC,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 7

I've also heard from a colleague (edit: @quinnpeters) that they don't get the error with a much older version of R (v3.6.3).

@malisas
Copy link
Contributor

malisas commented Aug 30, 2024

Hi @emjbishop I got your separate private message to me about this issue.
Thanks for providing the reproducible example. Just want to chime in that I was able to reproduce, and I think the issue has to do with NA values getting generated by COMPASSContainerFromGatingSet, which internally calls flowWorkspace::gs_get_singlecell_expression. I wonder if more recent R versions introduced different behavior in a dependency to this function, and caused NA values to be generated.

Here's some runtime output showing how COMPASS failed on CC but succeeded on CC2. NA values don't always get generated it seems: Both COMPASSContainers were generated by the same call to COMPASSContainerFromGatingSet, but CC contains NA values in the data used by COMPASS:

> CC <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.


> CC2 <- COMPASSContainerFromGatingSet(gs,
+                                     node = parent_node,
+                                     individual_id = id,
+                                     mp = markermap,
+                                     countFilterThreshold = 5000)
Extracting cell counts
Fetching 4+
Fetching child nodes
common markers are: 
Time FSC-A FSC-H SSC-A CD8b TNFa CD107a CD154 CD3 IL2 CD4 IL17a IL4_5_13 CD14_19 CCR7 CD38 LD IFNg CD45RA HLADR 
Extracting single cell data for 4+/IL2|4+/IL4513|4+/IFNG|4+/TNF|4+/IL17|4+/154|4+/107a
..............................Creating COMPASS Container
Filtering low counts
Filtering 0 samples due to low counts
Warning message:
In COMPASSContainer(data = sc_data, counts = counts, meta = pd,  :
  There appear to be negative intensities in the 'data' supplied.


> CC$data$`114716.fcs_445737`[100,]
          IL2      IL4_5_13          IFNg          TNFa         IL17a         CD154        CD107a 
1.018558e-312 2.781342e-309 8.257046e-317  1.746838e+03 2.483463e-265 1.807873e-308 1.157082e-309 
> CC2$data$`114716.fcs_445737`[100,]
          IL2      IL4_5_13          IFNg          TNFa         IL17a         CD154        CD107a 
1.820499e-318 1.823898e-318 1.827297e-318  1.746838e+03 1.834095e-318 1.837494e-318 1.840894e-318 


> dmso_spike_1_sn <- CC2$meta %>% dplyr::filter(STIM %in% c("DMSO", "Spike 1")) %>% dplyr::pull(name)
> table(unlist((lapply(CC$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))

FALSE  TRUE 
    9     1 
> table(unlist((lapply(CC2$data[dmso_spike_1_sn], function(x) {any(is.na(x))}))))

FALSE 
   10 


> fit <- COMPASS(CC,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 7


> any(is.na(CC$data$`114716.fcs_445737`))
[1] TRUE
> any(is.na(CC2$data$`114716.fcs_445737`))
[1] FALSE


> fit2 <- COMPASS(CC2,
+                treatment = STIM == "Spike 1",
+                control = STIM == "DMSO",
+                iterations = 100)
There are a total of 5 samples from 5 individuals in the 'treatment' group.
There are a total of 5 samples from 5 individuals in the 'control' group.
The model will be run on 5 paired samples.
The category filter has removed 123 of 125 categories.
There are a total of 2 categories to be tested.
Initializing parameters...
Computing initial parameter estimates...
Keeping 100 iterations. We'll thin every 8 iterations.
Burnin for 100 iterations...
Sampling 800 iterations...
Done!
Computing the posterior difference in proportions, posterior log ratio...
Done!
> fit2
A COMPASS model fit on 5 paired samples.
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.16.0    COMPASS_1.38.1       flowWorkspace_4.12.2 flowCore_2.12.2      CytoML_2.12.0        here_1.0.1          

loaded via a namespace (and not attached):
 [1] tidyr_1.3.1         utf8_1.2.3          generics_0.1.3      lattice_0.21-8      digest_0.6.33       magrittr_2.0.3      evaluate_0.21       grid_4.3.1         
 [9] RColorBrewer_1.1-3  fastmap_1.1.1       rprojroot_2.0.3     ggcyto_1.28.1       plyr_1.8.9          jsonlite_1.8.7      graph_1.78.0        gridExtra_2.3      
[17] BiocManager_1.30.22 purrr_1.0.2         fansi_1.0.4         scales_1.3.0        XML_3.99-0.16       Rgraphviz_2.44.0    abind_1.4-5         cli_3.6.1          
[25] rlang_1.1.1         RProtoBufLib_2.12.1 Biobase_2.60.0      munsell_0.5.0       yaml_2.3.7          cytolib_2.12.1      tools_4.3.1         ncdfFlow_2.46.0    
[33] dplyr_1.1.4         colorspace_2.1-0    ggplot2_3.4.4       BiocGenerics_0.46.0 vctrs_0.6.4         R6_2.5.1            matrixStats_1.1.0   stats4_4.3.1       
[41] lifecycle_1.0.3     zlibbioc_1.46.0     S4Vectors_0.38.2    RBGL_1.76.0         clue_0.3-65         pdist_1.2.1         cluster_2.1.4       pkgconfig_2.0.3    
[49] pillar_1.9.0        hexbin_1.28.3       gtable_0.3.4        glue_1.6.2          Rcpp_1.0.11         xfun_0.40           tibble_3.2.1        tidyselect_1.2.0   
[57] knitr_1.43          rstudioapi_0.15.0   htmltools_0.5.6     rmarkdown_2.24      compiler_4.3.1     

Haven't gotten to the root of the problem but documenting the investigation so far.

@gfinak
Copy link
Member

gfinak commented Aug 30, 2024

I suggest looking at the SimpleCOMPASS interface. CompassContainerFromGatingSet makes a lot of assumptions and we ended up abandoning it internally.

@emjbishop
Copy link

emjbishop commented Aug 31, 2024

Thank you @malisas and @gfinak, we will switch to SimpleCOMPASS (which seems to work with R 4.4.1). We appreciate it and will pass this on to our collaborators.

@emjbishop
Copy link

Hello again, we appreciate the suggestion to use SimpleCOMPASS and indeed it works well. However, when we have dozens of subject samples with multiple stims and cell types it's not practical to generate the boolean combinations in FlowJo for each subject. Do you have a way to automatically generate the boolean combinations, or a method in FlowJo?

@gfinak
Copy link
Member

gfinak commented Nov 19, 2024

Indeed, we would typically export the flowJo wsp file, read it in with flowWorskpace, then there are some internal functions that could generate the boolean counts from the marginal gated data at the single cell level.

The doc for SimpleCompass highlights the functions used:

##' n_s <- CellCounts( data[1:n], Combinations(k) )
##' n_u <- CellCounts( data[(n+1):(2*n)], Combinations(k) )

CellCounts can be used generate the data you need for SimpleCompass, given inputs are a list of matrices, where each matrix is the single cell expression of cells from a cell population in a sample (e.g. CD4 T cells) for rows, and columns are each marker/protein measured and thresholded to be 0 if it's below the gate (i.e. negative).
With some work on a GatingSet created from the FCS files and an imported flowJo workspace, you can generate these matrices, and the required data (left as an exercise for the reader).

@emjbishop
Copy link

emjbishop commented Dec 5, 2024

Thanks, this is really helpful. However, I'm not getting the same poly/functionality scores or heatmap as with regular COMPASS.

You can see the scores and heatmaps at the bottom of these READMEs:

Any thoughts? Perhaps I can tweak the boolean combinations somehow?

P.S. When I make the heatmap with SimpleCOMPASS I get the console message that ## The 'threshold' filter has removed 122 categories: ...

@gfinak
Copy link
Member

gfinak commented Dec 5, 2024

First I'd suggest checking that that data are indeed the same (the input matrices).
If they really are, my guess is not enough iterations have run for the model to converge. Or the seed is substantially different. Or all of the above.
It might also be the category filter, which I believe filters out elements with less than 5 cells (and assigns them to the total).

@emjbishop
Copy link

Thanks for the suggestions. I tried using 20,000 iterations and the same seed but I'm still getting a different result. I do believe that my filtered categories have <5 cells, however I wouldn't expect that to matter since I'm assuming that COMPASS and SimpleCOMPASS use the same filter and I think the input data is the same.

Regarding input data, they both started with the same GatingSet object, but for COMPASS we used COMPASSContainerFromGatingSet() and for SimpleCOMPASS I extracted the matrices manually. Essentially these were my steps:

# Extract stimmed (spike1) and unstimmed (dmso) samples from GatingSet
spike1 <- gs[sp1_samples]
dmso <- gs[dmso_samples]

# Extract CD4+ data (output is cytoset object)
spike1_cd4 <- gs_pop_get_data(spike1, "4+")
dmso_cd4 <- gs_pop_get_data(dmso, "4+")

# Get lists of matrices
get_mtx_list <- function(cytoset_obj) {
  outlist = list()
  for (i in sampleNames(cytoset_obj)) {
    sub_cf <- get_cytoframe_from_cs(cytoset_obj, i)  # Extract cytoframe from cytoset
    out <- exprs(sub_cf)  # Get expression values as a matrix
    colnames(out) <- markernames  # Rename columns (e.g. <V450-A> becomes IFNg)
    out[out < 0] <- 0  # Replace negative numbers with 0 for CellCounts() later
    out <- out[, colnames(out) %in% markermap]  # Only keep markers we want boolean combos of
    outlist[[i]] <- out
  }
  return(outlist)
}

spikelist <- get_mtx_list(spike1_cd4)
dmsolist <- get_mtx_list(dmso_cd4)


# Create counts data (we have 7 markers)
n_s <- CellCounts(spikelist, Combinations(7))
n_u <- CellCounts(dmsolist, Combinations(7))

If this all looks good then perhaps we'd need a vignette for going from GatingSet to SimpleCOMPASS, since I'm not sure where this is going wrong.

@gfinak
Copy link
Member

gfinak commented Dec 6, 2024

The problem is here:

    out[out < 0] <- 0  # Replace negative numbers with 0 for CellCounts() later

The threshold is intended to be placed at the location of the functional gate, not literally negative, but negative for the marker, which would typically be some positive MFI value. I assume you've gated your data.
I believe there's some function in the package for that.. but I'll have to get back to you on that. Otherwise, you'll need to extract the gate for each marker for each sample, figure out what the location is, and set any value of the mfi for that marker that's below the gate (i.e. negative in terms of not expressing the marker above background, which isn't necessarily 0), and set it to 0, so that CellCounts can do the correct counting.

Best
Greg

@gfinak
Copy link
Member

gfinak commented Dec 28, 2024

Sorry for the delay. You may want to look into something like gsGetSingleCellExpression to ensure you start with a list of properly thresholded matrices.
https://github.com/RGLab/flowWorkspace/blob/3951e1923e2484db6744154f27c44ee00428edf3/R/getSingleCellExpression.R#L95

emjbishop added a commit to seshadrilab/tutorials that referenced this issue Jan 9, 2025
In our original tutorial we only looked at combinations of 7 markers.

I had erroneously set all negative values in the matrices to 0, when it
should have been only cells that were negative for that marker by gating
that are set to 0. See RGLab/COMPASS#81 for
context.
@emjbishop
Copy link

Thank you, this is very helpful. I still can't replicate the original heatmap I got with regular COMPASS, but my current heatmap is much closer now using the code below:

# Extract stimmed (spike1) and unstimmed (dmso) samples from GatingSet
spike1 <- gs[sp1_samples]
dmso <- gs[dmso_samples]

# Extract data for the "CD4+" populations only. Using our 7 markers of interest.
mynodes <- c("4+/107a", "4+/154", "4+/IFNG", "4+/IL2", "4+/IL17", "4+/IL4513",
             "4+/TNF")

maplist <- list("4+/107a" = "CD107a", "4+/154" = "CD154", "4+/IFNG" = "IFNg",
                "4+/IL2" = "IL2", "4+/IL17" = "IL17a", "4+/IL4513" = "IL4_5_13",
                "4+/TNF" = "TNFa")

spike1_cd4 <- gs_get_singlecell_expression(spike1, nodes = mynodes, map = maplist)
dmso_cd4 <- gs_get_singlecell_expression(dmso, nodes = mynodes, map = maplist)

# Create boolean combinations
n_s <- CellCounts(spike1_cd4, Combinations(7)) # Stimmed data
n_u <- CellCounts(dmso_cd4, Combinations(7)) # Unstimmed data

# Apply a 'category filter': Only keep combinations where at least one sample 
# has at least six cells expressing that particular combination of markers.
# There are no categories shared among the stim and unstim conditions where
# more than one sample has cells.
n_s_filtered <- n_s[, colSums(n_s > 5) >= 1, drop = FALSE]
n_u_filtered <- n_u[, colSums(n_u > 5) >= 1, drop = FALSE]

# Filter to keep only column names common to both, to avoid errors later.
common_cols <- intersect(colnames(n_s_filtered), colnames(n_u_filtered))

n_s_filtered2 <- n_s_filtered[, common_cols, drop=FALSE]
n_u_filtered2 <- n_u_filtered[, common_cols, drop=FALSE]


# Run SimpleCOMPASS using same number of iterations as original COMPASS
fit = COMPASS::SimpleCOMPASS(n_s = n_s_filtered2,
                             n_u = n_u_filtered2,
                             meta = metadata,
                             individual_id = "SAMPLE ID",
                             iterations = 40000)

Unless there are other suggestions this might be the best we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants