Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.ncdfFlowSet multiple datasets #47

Open
gunthergl opened this issue Jul 31, 2019 · 5 comments
Open

read.ncdfFlowSet multiple datasets #47

gunthergl opened this issue Jul 31, 2019 · 5 comments

Comments

@gunthergl
Copy link

gunthergl commented Jul 31, 2019

Dear all, just started using your package and stumbled during reading a file:

# devtools::install_github("RGLab/cytolib", ref="trunk")
# devtools::install_github("RGLab/flowCore", ref="trunk")
# devtools::install_github("RGLab/ncdfFlow", ref="trunk")
library(ncdfFlow)

tmp <- read.ncdfFlowSet(files = my_single_LMD_file
						,dataset=1)

All FCS files have the same following channels:
FS INT LIN
FS TOF LIN
SS INT LIN
FL1 INT LOG
FL2 INT LOG
FL3 INT LOG
FL5 INT LOG
FL6 INT LOG
FL8 INT LOG
FL9 INT LOG
FL10 INT LOG
TIME
write HKP01 P0A E01 B cells .LMD to empty cdf slot...
done!

tmp <- read.ncdfFlowSet(files = my_single_LMD_file
						,dataset=2)

All FCS files have the same following channels:
FS INT LIN
FS TOF LIN
SS INT LIN
FL1 INT LOG
FL2 INT LOG
FL3 INT LOG
FL5 INT LOG
FL6 INT LOG
FL8 INT LOG
FL9 INT LOG
FL10 INT LOG
TIME
Error: Subset out of bounds

Browsing in, the issue is inside read.ncdfFlowSet() in line 127, call to my.read.FCS(i). Inside my.read.FCS(i) in line 7:

this_fr[, chnls_common]

Where chnls_common:

"FS INT LIN" "FS TOF LIN" "SS INT LIN" "FL1 INT LOG" "FL2 INT LOG" "FL3 INT LOG" "FL5 INT LOG" "FL6 INT LOG" "FL8 INT LOG" "FL9 INT LOG" "FL10 INT LOG" "TIME"

But colnames(this_fr):

"TIME"

No idea what happened here. Could be it has something to do that even when reading the file with flowCore the feature names are different in the two datasets. (Once saved as FCS2 - dataset 1, once as FCS3 - dataset 2)

tmp_FC_1 <- flowCore::read.FCS(my_single_LMD_file
							   ,dataset=1)
tmp_FC_2 <- flowCore::read.FCS(my_single_LMD_file
							   ,dataset=2)
tmp_FC_1

flowFrame object 'myfile.LMD'
with 99722 cells and 12 observables:

name desc range minRange maxRange
$P1 FS INT LIN FS INT LIN 1024 0.0000000
$P2 FS TOF LIN FS TOF LIN 1024 0.0000000
$P3 SS INT LIN SS INT LIN 1024 0.0000000
$P4 FL1 INT LOG desc1 1024 0.1024944
$P5 FL2 INT LOG desc2 1024 0.1024944
$P6 FL3 INT LOG desc3 1024 0.1024944
$P7 FL5 INT LOG desc4 1024 0.1024944
$P8 FL6 INT LOG desc5 1024 0.1024944
$P9 FL8 INT LOG desc6 1024 0.1024944
$P10 FL9 INT LOG desc7 1024 0.1024944
$P11 FL10 INT LOG desc8 1024 0.1024944
$P12 TIME TIME 1024 0.0000000

349 keywords are stored in the 'description' slot

tmp_FC_2

flowFrame object 'myfile.LMD'
with 99722 cells and 12 observables:

name desc range minRange maxRange
$P1 FS-A 1048576 0
$P2 FS-W 1024 0
$P3 SS-A 1048576 0
$P4 FL1-A 1048576 0
$P5 FL2-A 1048576 0
$P6 FL3-A 1048576 0
$P7 FL5-A 1048576 0
$P8 FL6-A 1048576 0
$P9 FL8-A 1048576 0
$P10 FL9-A 1048576 0
$P11 FL10-A 1048576 0
$P12 TIME 1048576 0

127 keywords are stored in the 'description' slot

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ncdfFlow_2.31.3           BH_1.69.0-1               RcppArmadillo_0.9.600.4.0 flowCore_1.51.7          

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2          matrixStats_0.54.0  withr_2.1.2         crayon_1.3.4        assertthat_0.2.1    pacman_0.5.1        stats4_3.6.1       
[8] cli_1.1.0           zlibbioc_1.30.0     rstudioapi_0.10     tools_3.6.1         Biobase_2.44.0      parallel_3.6.1      compiler_3.6.1     
[15] BiocGenerics_0.30.0 sessioninfo_1.1.1  

Do not feel pressured at all - so far I will just continue using flowCore.

A small question anyways: Is there a straight-forward way to convert a flowCore-flowFrame to a ncdfFlow-flowFrame?

@gfinak
Copy link
Member

gfinak commented Jul 31, 2019

I don't know if there are separate keywords in the FCS file for the two data sets or if this is a bug in flowCore, we can look into that.
In order to combine these into a flowSet, you'll need to set the column names via colnames<- and markers via markers<-, then convert to an ncdfFlowSet.
Here's an example:

library(flowCore)
library(ncdfFlow)
#> Loading required package: RcppArmadillo
#> Loading required package: BH
#' build a couple of toy flowFrames with different marker and channel names.
m <- matrix(rnorm(1000),ncol=4)
m2 <- matrix(rnorm(1000),ncol=4)
colnames(m)<-LETTERS[1:4]
colnames(m2)<-LETTERS[5:8]
fa<-flowFrame(m)
fb<-flowFrame(m2)
#' Can't do this yet, the marker and channel names don't match.
flowSet(fa,fb)
#> 000002_V2 doesn't have the identical colnames as the other samples!
#> Error in validObject(.Object): invalid class "flowSet" object: Some items identified in the data environment either have the wrong dimension or type.

#' set the column names of flowFrame A to be the same as B
colnames(fa)<-colnames(fb)
#' Get the marker names of flowFrame B
na<-markernames(fb)
#' set the column names of the vector of marker names, this acts as a map.
names(na)<-colnames(fb)
#' finally set the marker names of flowFrame A 
markernames(fa)<-na
#' now construct the flowSet
fs <- flowSet(list(fa,fb))
ncdfFlowSet(fs)
#> write V1 to empty cdf slot...
#> write V2 to empty cdf slot...
#> An ncdfFlowSet with 2 samples.
#> NCDF file : /var/folders/4x/t5qt3m717tbf3yml7h971mvc0000gn/T//RtmpQ29rRm/ncfs184f82a245d24.nc 
#> An object of class 'AnnotatedDataFrame'
#>   rowNames: V1 V2
#>   varLabels: name
#>   varMetadata: labelDescription
#> 
#>   column names:
#>     E, F, G, H

Created on 2019-07-31 by the reprex package (v0.3.0)

@mikejiang
Copy link
Member

read.ncdfFlowSet currently doesn't support reading the data segment other than the first one. I will see if I can fix that.

@mikejiang
Copy link
Member

pull the latest flowCore and ncdfFlow from trunk and let me know if it works

@gunthergl
Copy link
Author

gunthergl commented Aug 1, 2019

Works great, thank you so much for your immediate response!

Just for your information; when trying to read in the file now without setting the dataset parameter, the warning occurs multiple times. For me nothing to worry about.

tmp <- ncdfFlow::read.ncdfFlowSet(files = my_single_LMD_file)
#> All FCS files have the same following channels:
#> FS INT LIN
#> FS TOF LIN
#> SS INT LIN
#> FL1 INT LOG
#> FL2 INT LOG
#> FL3 INT LOG
#> FL5 INT LOG
#> FL6 INT LOG
#> FL8 INT LOG
#> FL9 INT LOG
#> FL10 INT LOG
#> TIME
#> write HKP01 P0A E01 B cells .LMD to empty cdf slot...
#> done!
#> Warning messages:
#> 1: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument. 
#> 2: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument. 
#> 3: The file contains 1 additional data segment.
#> The default is to read the first segment only.
#> Please consider setting the 'dataset' argument. 

@mikejiang
Copy link
Member

Thanks for pointing it out! It should be all good now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants