read.ncdfFlowSet multiple datasets #47

gunthergl · 2019-07-31T14:18:01Z

Dear all, just started using your package and stumbled during reading a file:

# devtools::install_github("RGLab/cytolib", ref="trunk")
# devtools::install_github("RGLab/flowCore", ref="trunk")
# devtools::install_github("RGLab/ncdfFlow", ref="trunk")
library(ncdfFlow)

tmp <- read.ncdfFlowSet(files = my_single_LMD_file
						,dataset=1)

All FCS files have the same following channels:
FS INT LIN
FS TOF LIN
SS INT LIN
FL1 INT LOG
FL2 INT LOG
FL3 INT LOG
FL5 INT LOG
FL6 INT LOG
FL8 INT LOG
FL9 INT LOG
FL10 INT LOG
TIME
write HKP01 P0A E01 B cells .LMD to empty cdf slot...
done!

tmp <- read.ncdfFlowSet(files = my_single_LMD_file
						,dataset=2)

All FCS files have the same following channels:
FS INT LIN
FS TOF LIN
SS INT LIN
FL1 INT LOG
FL2 INT LOG
FL3 INT LOG
FL5 INT LOG
FL6 INT LOG
FL8 INT LOG
FL9 INT LOG
FL10 INT LOG
TIME
Error: Subset out of bounds

Browsing in, the issue is inside read.ncdfFlowSet() in line 127, call to my.read.FCS(i). Inside my.read.FCS(i) in line 7:

this_fr[, chnls_common]

Where chnls_common:

"FS INT LIN" "FS TOF LIN" "SS INT LIN" "FL1 INT LOG" "FL2 INT LOG" "FL3 INT LOG" "FL5 INT LOG" "FL6 INT LOG" "FL8 INT LOG" "FL9 INT LOG" "FL10 INT LOG" "TIME"

But colnames(this_fr):

"TIME"

No idea what happened here. Could be it has something to do that even when reading the file with flowCore the feature names are different in the two datasets. (Once saved as FCS2 - dataset 1, once as FCS3 - dataset 2)

tmp_FC_1 <- flowCore::read.FCS(my_single_LMD_file
							   ,dataset=1)
tmp_FC_2 <- flowCore::read.FCS(my_single_LMD_file
							   ,dataset=2)
tmp_FC_1

flowFrame object 'myfile.LMD'
with 99722 cells and 12 observables:

name	desc	range	minRange	maxRange
$P1	FS INT LIN	FS INT LIN	1024	0.0000000
$P2	FS TOF LIN	FS TOF LIN	1024	0.0000000
$P3	SS INT LIN	SS INT LIN	1024	0.0000000
$P4	FL1 INT LOG	desc1	1024	0.1024944
$P5	FL2 INT LOG	desc2	1024	0.1024944
$P6	FL3 INT LOG	desc3	1024	0.1024944
$P7	FL5 INT LOG	desc4	1024	0.1024944
$P8	FL6 INT LOG	desc5	1024	0.1024944
$P9	FL8 INT LOG	desc6	1024	0.1024944
$P10	FL9 INT LOG	desc7	1024	0.1024944
$P11	FL10 INT LOG	desc8	1024	0.1024944
$P12	TIME	TIME	1024	0.0000000

349 keywords are stored in the 'description' slot

tmp_FC_2

flowFrame object 'myfile.LMD'
with 99722 cells and 12 observables:

name	desc	minRange
$P1	FS-A	1048576
$P2	FS-W	1024
$P3	SS-A	1048576
$P4	FL1-A	1048576
$P5	FL2-A	1048576
$P6	FL3-A	1048576
$P7	FL5-A	1048576
$P8	FL6-A	1048576
$P9	FL8-A	1048576
$P10	FL9-A	1048576
$P11	FL10-A	1048576
$P12	TIME	1048576

127 keywords are stored in the 'description' slot

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ncdfFlow_2.31.3           BH_1.69.0-1               RcppArmadillo_0.9.600.4.0 flowCore_1.51.7          

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2          matrixStats_0.54.0  withr_2.1.2         crayon_1.3.4        assertthat_0.2.1    pacman_0.5.1        stats4_3.6.1       
[8] cli_1.1.0           zlibbioc_1.30.0     rstudioapi_0.10     tools_3.6.1         Biobase_2.44.0      parallel_3.6.1      compiler_3.6.1     
[15] BiocGenerics_0.30.0 sessioninfo_1.1.1

Do not feel pressured at all - so far I will just continue using flowCore.

A small question anyways: Is there a straight-forward way to convert a flowCore-flowFrame to a ncdfFlow-flowFrame?

The text was updated successfully, but these errors were encountered:

gfinak · 2019-07-31T15:42:06Z

I don't know if there are separate keywords in the FCS file for the two data sets or if this is a bug in flowCore, we can look into that.
In order to combine these into a flowSet, you'll need to set the column names via colnames<- and markers via markers<-, then convert to an ncdfFlowSet.
Here's an example:

library(flowCore)
library(ncdfFlow)
#> Loading required package: RcppArmadillo
#> Loading required package: BH
#' build a couple of toy flowFrames with different marker and channel names.
m <- matrix(rnorm(1000),ncol=4)
m2 <- matrix(rnorm(1000),ncol=4)
colnames(m)<-LETTERS[1:4]
colnames(m2)<-LETTERS[5:8]
fa<-flowFrame(m)
fb<-flowFrame(m2)
#' Can't do this yet, the marker and channel names don't match.
flowSet(fa,fb)
#> 000002_V2 doesn't have the identical colnames as the other samples!
#> Error in validObject(.Object): invalid class "flowSet" object: Some items identified in the data environment either have the wrong dimension or type.

#' set the column names of flowFrame A to be the same as B
colnames(fa)<-colnames(fb)
#' Get the marker names of flowFrame B
na<-markernames(fb)
#' set the column names of the vector of marker names, this acts as a map.
names(na)<-colnames(fb)
#' finally set the marker names of flowFrame A 
markernames(fa)<-na
#' now construct the flowSet
fs <- flowSet(list(fa,fb))
ncdfFlowSet(fs)
#> write V1 to empty cdf slot...
#> write V2 to empty cdf slot...
#> An ncdfFlowSet with 2 samples.
#> NCDF file : /var/folders/4x/t5qt3m717tbf3yml7h971mvc0000gn/T//RtmpQ29rRm/ncfs184f82a245d24.nc 
#> An object of class 'AnnotatedDataFrame'
#>   rowNames: V1 V2
#>   varLabels: name
#>   varMetadata: labelDescription
#> 
#>   column names:
#>     E, F, G, H

^{Created on 2019-07-31 by the reprex package (v0.3.0)}

mikejiang · 2019-07-31T20:10:05Z

read.ncdfFlowSet currently doesn't support reading the data segment other than the first one. I will see if I can fix that.

mikejiang · 2019-07-31T21:09:54Z

pull the latest flowCore and ncdfFlow from trunk and let me know if it works

gunthergl · 2019-08-01T10:41:31Z

Works great, thank you so much for your immediate response!

Just for your information; when trying to read in the file now without setting the dataset parameter, the warning occurs multiple times. For me nothing to worry about.

tmp <- ncdfFlow::read.ncdfFlowSet(files = my_single_LMD_file)
#> All FCS files have the same following channels:
#> FS INT LIN
#> FS TOF LIN
#> SS INT LIN
#> FL1 INT LOG
#> FL2 INT LOG
#> FL3 INT LOG
#> FL5 INT LOG
#> FL6 INT LOG
#> FL8 INT LOG
#> FL9 INT LOG
#> FL10 INT LOG
#> TIME
#> write HKP01 P0A E01 B cells .LMD to empty cdf slot...
#> done!
#> Warning messages:
#> 1: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument. 
#> 2: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument. 
#> 3: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument.

…ges #47

mikejiang · 2019-08-01T18:50:56Z

Thanks for pointing it out! It should be all good now.

mikejiang pushed a commit that referenced this issue Jul 31, 2019

add multiple datasets support to read.ncdfFlowSet #47

6d334c5

mikejiang pushed a commit to RGLab/flowCore that referenced this issue Jul 31, 2019

add multi data segment support for read.FCSheader RGLab/ncdfFlow#47

dce6fde

mikejiang pushed a commit that referenced this issue Aug 1, 2019

set dataset argument if it is absent to avoid redundant warning messa…

32bc6dd

…ges #47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read.ncdfFlowSet multiple datasets #47

read.ncdfFlowSet multiple datasets #47

gunthergl commented Jul 31, 2019 •

edited

Loading

gfinak commented Jul 31, 2019

mikejiang commented Jul 31, 2019

mikejiang commented Jul 31, 2019

gunthergl commented Aug 1, 2019 •

edited

Loading

mikejiang commented Aug 1, 2019

read.ncdfFlowSet multiple datasets #47

read.ncdfFlowSet multiple datasets #47

Comments

gunthergl commented Jul 31, 2019 • edited Loading

gfinak commented Jul 31, 2019

mikejiang commented Jul 31, 2019

mikejiang commented Jul 31, 2019

gunthergl commented Aug 1, 2019 • edited Loading

mikejiang commented Aug 1, 2019

gunthergl commented Jul 31, 2019 •

edited

Loading

gunthergl commented Aug 1, 2019 •

edited

Loading