Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent CSV format required for populate metadata on Plate vv Project #29

Open
will-moore opened this issue Jul 10, 2019 · 1 comment

Comments

@will-moore
Copy link
Member

will-moore commented Jul 10, 2019

Looking at examples added to the README in #28
it seems that the CSV format has different rules for Screen compared with Project:

Screen example has columns of Well,Plate and the column types are well,plate.
Omitting the # header has no effect on column types of well and plate.

# header well,plate,s,d,l,d
Well,Plate,Drug,Concentration,Cell_Count,Percent_Mitotic
A1,plate01,DMSO,10.1,10,25.4
A2,plate01,DMSO,0.1,1000,2.54
A3,plate01,DMSO,5.5,550,4
B1,plate01,DrugX,12.3,50,44.43

This creates an OMERO.table with additional Well Name and Plate Name columns and the previous Well and Plate columns now have IDs instead of Names.

Well Plate Drug Concentration Cell Count Percent Mitotic Well Name Plate Name
9154 3855 DMSO 10.1 10 25.4 a1 plate01
9155 3855 DMSO 0.1 1000 2.54 a2 plate01

If I use column named Plate Name instead of Plate in the csv, the script fails:

  File "/Users/willadmin/Virtual/omero/lib/python2.7/site-packages/omero_metadata/populate.py", line 1241, in post_process
    plate = columns_by_name['plate'].values[i]   # FIXME

and it also fails if I use Well Name instead of Well.

When the target is a Project, we need columns named Dataset Name and Image Name in the CSV and these columns are of type String.

# header s,s,d,l,s
Image Name,Dataset Name,Bounding_Box,Channel_Index,Channel_Name
img-01.png,dataset01,0.0469,1,DAPI
img-02.png,dataset01,0.142,2,GFP
img-03.png,dataset01,0.093,3,TRITC
img-04.png,dataset01,0.429,4,Cy5

This creates an OMERO.table with an additional Image ID column:

Image Name Dataset Name Bounding_Box Channel_Index Channel_Name Image
img-01.png dataset01 0.0469 1 DAPI 36638
img-02.png dataset01 0.142 2 GFP 36639

If I name the first column to Image I get a failure with

ValueError: invalid literal for long() with base 10: 'img-01.png'

I'm just documenting this to clarify the results of my investigations and document the current behaviour. It seems inconsistent for Plate that we use Well and Plate columns that are really Well Name and Plate Name. The Project behaviour is more expected.

@will-moore
Copy link
Member Author

Also, if the project.csv has any existing column named Image, even if it is the correct Image ID or some other number or string then the script fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant