Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idr0036-gustafsdottir-cellpainting S-BIAD855 #640

Open
dominikl opened this issue Feb 22, 2023 · 31 comments
Open

idr0036-gustafsdottir-cellpainting S-BIAD855 #640

dominikl opened this issue Feb 22, 2023 · 31 comments

Comments

@dominikl
Copy link
Member

idr0036-gustafsdottir-cellpainting

Sample plate conversion failed with:

Conversion fails with:
(base) [dlindner@pilot-zarr2-dev idr0036]$ time /home/dlindner/bioformats2raw/bin/bioformats2raw 20608.screen 20608.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp1378964024917150329/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[Fatal Error] :1:73: Character reference "&#0" is an invalid XML character.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@63a65a25): java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 73; Character reference "&#0" is an invalid XML character.
        at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
        at picocli.CommandLine.access$1300(CommandLine.java:145)

Already handled by: IDR/bioformats#29 .

@will-moore successfully exported it using omero-cli-zarr.

@dominikl dominikl moved this to test convert in NGFF conversion Feb 22, 2023
@dominikl dominikl added the bug label Mar 6, 2023
@dominikl
Copy link
Member Author

Started conversion of the full dataset with omero-cli-zarr on pilot-zarr1-dev.

@will-moore
Copy link
Member

will-moore commented Apr 26, 2023

Looks good:

(base) [wmoore@pilot-zarr1-dev ~]$ ls -alh /data/idr0036/
total 4.0K
drwxrwxr-x. 23 wmoore   idrnfs   4.0K Apr 26 13:56 .
drwxrwxr-x. 16 root     idr-data  289 Apr 24 11:56 ..
drwxrwxr-x. 18 dlindner dlindner  256 Apr 24 14:32 4403.zarr
drwxrwxr-x. 18 wmoore   idrnfs    256 Feb 16 14:30 4403.zarr.old
drwxrwxr-x. 18 dlindner dlindner  256 Apr 24 19:44 4705.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 24 22:19 4881.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 01:00 4883.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 03:40 4884.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 06:12 4886.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 08:47 4887.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 11:23 4889.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 14:01 4890.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 16:38 4892.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 19:20 4893.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 25 21:58 4895.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 00:35 4896.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 03:15 4898.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 05:51 4899.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 08:31 4900.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 13:47 4902.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 26 11:09 4903.zarr
drwxrwxr-x.  6 dlindner dlindner  100 Apr 26 14:25 4905.zarr
drwxrwxr-x. 18 dlindner dlindner  256 Apr 24 17:02 5352.zarr

Each plate is about 10 GB -> 200 GB in total:

$ du -sh ./4403.zarr
9.8G	./4403.zarr

@will-moore will-moore moved this from test convert to create new Fileset to replace original Fileset in NGFF conversion Apr 26, 2023
@will-moore will-moore moved this from create new Fileset to replace original Fileset to upload data to s3 in NGFF conversion Apr 26, 2023
@will-moore
Copy link
Member

@sbesson OK for me to create a bucket at https://uk1s3.embassy.ebi.ac.uk/idr0036/ and upload 200 GB there?

@will-moore
Copy link
Member

will-moore commented Apr 26, 2023

Created bucket and set policy... https://github.com/IDR/deployment/blob/master/docs/object-store.md#policy

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0036
make_bucket: idr0036

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0036 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0036  --cors-configuration file://cors.json

Started upload...

cd /data
/home/wmoore/mc cp -r idr0036/ uk1s3/idr0036/zarr

@will-moore will-moore moved this from upload data to s3 to create new Fileset to replace original Fileset in NGFF conversion Apr 27, 2023
@will-moore
Copy link
Member

will-moore commented Apr 27, 2023

@sbesson
Copy link
Member

sbesson commented May 1, 2023

As discussed earlier today, converted a sample plate from this study using bioformats2raw for comparison.

All the TIFFs from the 20585-<channel> folders were symlinked under the same directory and a synthetic IXMtest.HTD file was added

"HTSInfoFile", Version 1.0
"Description", "BBBC0022 20585"
"PlateType", 1
"TimePoints", 1
"ZSeries", FALSE
"ZSteps", 1
"ZProjection", FALSE
"XWells", 24
"YWells", 16
"WellsSelection1", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection2", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection3", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection4", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection5", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection6", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection7", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection8", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection9", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection10", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection11", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection12", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection13", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection14", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection15", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"WellsSelection16", TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
"Sites", TRUE
"XSites", 3
"YSites", 3
"SiteSelection1", TRUE, TRUE, TRUE
"SiteSelection2", TRUE, TRUE, TRUE
"SiteSelection3", TRUE, TRUE, TRUE
"Waves", TRUE
"NWavelengths", 5
"WaveName1", "Hoechst"
"WaveName2", "ERSyto"
"WaveName3", "ERSytoBleed"
"WaveName4", "PhGolgi"
"WaveName5", "Mito"
"WaveCollect1", 1
"WaveCollect2", 1
"WaveCollect3", 1
"WaveCollect4", 1
"WaveCollect5", 1
"UniquePlateIdentifier", "abc123"
"EndFile"

bioformats2raw 0.6.1 completed in ~20min on pilot-zarr2-dev and the output was uploaded to the idr0036 bucket for comparison - see https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/20585.zarr

@sbesson sbesson self-assigned this May 2, 2023
@will-moore
Copy link
Member

Looking at @sbesson's bioformats2raw plate in vizarr, https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/20585.zarr I see a different ordering of channels and rendering settings/channel colors from the omero.cli.zarr plate above:

Screenshot 2023-05-17 at 14 46 47

A1 image from omero.cli.zarr: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/4403.zarr/A/1/0/

"channels": [
            {
                "active": true,
                "coefficient": 1.0,
                "color": "FF0000",
                "family": "linear",
                "inverted": false,
                "label": "ERSyto",
                "window": {
                    "end": 2878.0,
                    "max": 2878.0,
                    "min": 176.0,
                    "start": 176.0
                }
            },
            {
                "active": true,
                "coefficient": 1.0,
                "color": "00FF00",
                "family": "linear",
                "inverted": false,
                "label": "ERSytoBleed",
                "window": {
                    "end": 4095.0,
                    "max": 4095.0,
                    "min": 154.0,
                    "start": 154.0
                }
            },
            {
                "active": true,
                "coefficient": 1.0,
                "color": "0000FF",
                "family": "linear",
                "inverted": false,
                "label": "Hoechst",
                "window": {
                    "end": 2214.0,
                    "max": 2214.0,
                    "min": 140.0,
                    "start": 140.0
                }
            },

Compared with from `bioformats2raw: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/20585.zarr/A/1/0/

"omero" : {
    "channels" : [ {
      "color" : "FF0000",
      "coefficient" : 1,
      "active" : true,
      "label" : "Hoechst",
      "window" : {
        "min" : 140.0,
        "max" : 2214.0,
        "start" : 140.0,
        "end" : 2214.0
      },
      "family" : "linear",
      "inverted" : false
    }, {
      "color" : "00FF00",
      "coefficient" : 1,
      "active" : true,
      "label" : "ERSyto",
      "window" : {
        "min" : 176.0,
        "max" : 2878.0,
        "start" : 176.0,
        "end" : 2878.0
      },
      "family" : "linear",
      "inverted" : false
    }, {
      "color" : "0000FF",
      "coefficient" : 1,
      "active" : true,
      "label" : "ERSytoBleed",
      "window" : {
        "min" : 154.0,
        "max" : 4095.0,
        "start" : 154.0,
        "end" : 4095.0
      },
      "family" : "linear",
      "inverted" : false
    },

And IDR:

Screenshot 2023-05-17 at 14 55 45

@sbesson
Copy link
Member

sbesson commented May 17, 2023

Thanks, echoing what I mentioned post-conversion

importantly note that the order of the channels (Hoechst, ERSysto, ERSystoBleed, PhGolgi, Mito) is different from the images in IDR (ERSysto, ERSystoBleed, Hoechst, Mito, PhGolgi). I cannot comment on why it was decided this way but this will probably be an issue that will need to be resolved if we decide to replace IDR images by the output of this conversion tool

@sbesson
Copy link
Member

sbesson commented May 18, 2023

A few additional notes on the channel order:

  • the decisions made here will also affect idr0016 which orginally included the plates now published as idr0036
  • the NGFF representation above follows the way the channels were acquired on the microscope, as described in the publication and more specifically Table 1
  • looking at the metadata in IDR, the Channel value is Hoechst 33342:nucleus;concanavalin A (con A) AlexaFluor488 conjugate:endoplasmic reticulumn;SYTO 14 green fluorescent nucleic acid stain:nucleoli;wheat germ agglutinin (WGA) AlexaFluor594 conjugate:Golgi apparatus and plasma membrane;phalloidin AlexaFluor594 conjugate:F-actin;MitoTracker Deep Red: mitochondria and matches the order above
  • idr0033 is another plate which uses the same channels/markers. Here the order of the channels in IDR is Hoechst, ERSysto, ERSystoBleed, PhGolgi, Mito
  • I tried to identify how the channel order was decided during the original publication of idr0016 but I found no clear answer. My suspicion is that this comes from the fact the data was split into separate folders one per folder on disk and the screen files were generated by listing the channels in their alphabetical order (ERSysto, ERSystoBleed, Hoechst, Mito, PhGolgi)

From my side there are two options:

1- either decide the current channel order in IDR is the ground truth. This means regenerating the NGFF accordingly probably via some symlinking
2- or decide IDR should reflect order acquisition order of the channels and unify with idr0033. This means in addition to swapping fileset, running an upgrade script rotating the channel names, rendering settings and possibly regenerating thumbnails

Deferring to you and @francesw on what the right approach, it's additional work either way so we should consider the option that brings the maximal value

@will-moore
Copy link
Member

Since we have all plates already exported to NGFF with omero-cli-zarr, and the channel order here matches what's in IDR, we can simply proceed with these plates (at least test whether the import and Fileset swapping works with them)?

@sbesson
Copy link
Member

sbesson commented May 18, 2023

For testing definitely, but the same set of decisions will apply to #638 where we the plates have not been converted yet so I wanted to make sure we have this discussion prior to doing so

@will-moore
Copy link
Member

I noticed that the data exported with omero-cli-zarr above have the wrong dimension_separator for the downsampled resolutions (see fix in ome/omero-cli-zarr#144). E.g. https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/4403.zarr/A/1/0/

Probably will need to be re-exported with that fix (once tested etc) - I think that's going to be easier than fixing the dimension separator in place.

@sbesson sbesson removed their assignment May 29, 2023
@will-moore will-moore added question and removed bug labels Jun 23, 2023
@will-moore will-moore moved this from create new Fileset to replace original Fileset to convert all data to NGFF in NGFF conversion Jun 23, 2023
@will-moore
Copy link
Member

Discussed in meeting today: if omero-cli-zarr output is well-validated then this would appear to be the least work, because all following steps (Fileset replacement etc) will be the same as all other studies, and we keep same Image IDs, don't need a new import and re-annotate etc.

@joshmoore joshmoore self-assigned this Jun 28, 2023
@will-moore
Copy link
Member

Updated to the latest release of omero-cli-zarr on my omero_zarr_export conda environment on pilot-zarr1-dev, which contains the dimension separator fix:

(omero_zarr_export) [wmoore@pilot-zarr1-dev ~]$ pip freeze | grep omero
omero-cli-zarr==0.5.3
omero-py @ file:///usr/share/miniconda/conda-bld/omero-py_1644397478756/work

@will-moore
Copy link
Member

will-moore commented Jun 30, 2023

Running in a screen idr0036_export, logged in to idr-testing with public user...

for id in 4403 5352 4705 4881 4883 4884 4886 4887 4889 4890 4892 4893 4895 4896 4898 4899 4900 4902 4903 4905; do
  echo $id;
  omero zarr export Plate:$id;
done

@will-moore
Copy link
Member

Found a bug in omero-cli-zarr Plate export. Fix created at ome/omero-cli-zarr#146.
pip installed from that PR and ran export again...

@will-moore
Copy link
Member

Started to remove older data on s3 bucket...

$ ./mc rm --force --recursive uk1s3/idr0036/zarr/4403.zarr

@will-moore
Copy link
Member

Uploaded Plate:4403.zarr to uk1s3 to replace deleted plate.
Viewing in validator shows the dimension separator issue for downsampled resolutions is now fixed:
https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0036/zarr/4403.zarr/A/1/0/

NB: omero-cli-zarr generates Plates named ID.zarr but we want to use the Plate Name, so as to be consistent with the naming of Plates coming from bioformats2raw.

Since we only have 20 Plates, we can simply rename before zipping... We might-as-well also add the ome.zarr for consistency.

@will-moore
Copy link
Member

Export with omero-cli-zarr completed.
Renamed each Fileset with e.g. mv 4905.zarr 20646.ome.zarr.

Then zipped....

@will-moore
Copy link
Member

Started upload...

$ ./ascp -P33001 -i ../etc/asperaweb_id_dsa.openssh -d /data/idr0036/idr0036 [email protected]:5f/1

@will-moore will-moore moved this from convert all data to NGFF to Zip and upload to BioStudies in NGFF conversion Jul 3, 2023
@will-moore will-moore moved this from Zip and upload to BioStudies to BioStudies Submission in NGFF conversion Jul 3, 2023
@will-moore will-moore assigned francesw and unassigned joshmoore and will-moore Jul 3, 2023
@will-moore
Copy link
Member

Delete everything...

$ sudo rm -rf idr0036

@francesw francesw changed the title idr0036-gustafsdottir-cellpainting to NGFF idr0036-gustafsdottir-cellpainting S-BIAD855 Aug 23, 2023
@francesw francesw removed their assignment Aug 23, 2023
@francesw francesw moved this from BioStudies Submission to Data on Embassy s3 in NGFF conversion Aug 24, 2023
@will-moore will-moore moved this from Data on Embassy s3 to create new Filesets in idr-next in NGFF conversion Aug 28, 2023
@will-moore
Copy link
Member

will-moore commented Aug 29, 2023

All 20 Plates are at https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD855.html
Used parse_bia_uuids.py to generate idr0036.csv:

idr0036/20596.ome.zarr,S-BIAD855/02dc024f-2228-4d19-923e-0d55c1224ba3,21190
idr0036/20591.ome.zarr,S-BIAD855/0facd145-e4c1-4a25-b579-21c8e7cc98c6,21183
idr0036/20633.ome.zarr,S-BIAD855/15ad496d-7ab3-4901-9237-47c19cf636ed,21199
idr0036/20589.ome.zarr,S-BIAD855/2223047b-ef35-4503-9cdf-1ef3e18f2ec3,21005
idr0036/20641.ome.zarr,S-BIAD855/313c1b5a-b229-4f80-872e-374cddc5d4b2,21204
idr0036/20593.ome.zarr,S-BIAD855/5d9fc5ee-9550-4ac0-8c47-25665a235eb2,21186
idr0036/20608.ome.zarr,S-BIAD855/64f64c79-0593-44c5-aff9-6675ac196d2a,21193
idr0036/20595.ome.zarr,S-BIAD855/66336c82-6ac3-49ec-bf1f-dd27f7553585,21189
idr0036/20585.ome.zarr,S-BIAD855/781ac3d7-673f-47be-a4d2-3fdf3f477047,20253
idr0036/20639.ome.zarr,S-BIAD855/8cfc6903-910b-451c-8878-9a4c4f3e82bb,21201
idr0036/20625.ome.zarr,S-BIAD855/9366c761-4792-497a-83b5-6dd1906a49ad,21195
idr0036/20607.ome.zarr,S-BIAD855/a0c3c999-4aa2-496d-ae55-4b06a31721fa,21192
idr0036/20594.ome.zarr,S-BIAD855/b0949069-407e-42ca-8ae8-193824ddee39,21187
idr0036/20646.ome.zarr,S-BIAD855/c1e924aa-9159-45db-a72f-654de1893056,21205
idr0036/20590.ome.zarr,S-BIAD855/c7182c66-874f-44c0-9e0e-08b3514b7e52,21181
idr0036/20640.ome.zarr,S-BIAD855/db971048-7666-4df1-85cd-23b47429e5e0,21202
idr0036/20626.ome.zarr,S-BIAD855/dfaa6a30-5491-40b8-9b5f-19c2a8152b18,21196
idr0036/20592.ome.zarr,S-BIAD855/e42f0136-07a1-43cc-b833-3d19d87dca24,21184
idr0036/20630.ome.zarr,S-BIAD855/e47a40cc-2810-4717-8263-d11e620b9516,21198
idr0036/20586.ome.zarr,S-BIAD855/f3af2f1f-2952-4aee-a6db-0e2cd43585f7,21652
for r in $(cat idr0036.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" > "$fsid.sql"
done

UPDATE: after 14 hours running, only 8 Filesets have been completed...
UPDATE: after 18 hours running, only 10 Filesets completed...
UPDATE: after 39 hours running, 18 Filesets completed...
UPDATE: after 43 hours, cancelled run while processing last of 20 plates above

@will-moore
Copy link
Member

$ for r in $(cat $IDRID.csv); do   fsid=$(echo $r | cut -d',' -f3);   psql -U omero -d idr -h $DBHOST -f "$fsid.sql"; done
BEGIN
 mkngff_fileset 
----------------
        5287498
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287499
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287500
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287501
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287502
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287503
(1 row)
COMMIT
BEGIN
 mkngff_fileset 
----------------
        5287504
(1 row)
...

Since these Plates were created with omero-cli-zarr and we didn't have the fix at IDR/omero-mkngff#8 yet, we'll need to update some Filesets manually...

E.g. For the first Fileset above, check for paths etc... with cat 21190.sql. New FilesetID is 5287498,

idr=> UPDATE pixels SET name = '.zattrs', path = 'demo_2/2016-06/14/09-46-41.391_mkngff/02dc024f-2228-4d19-923e-0d55c1224ba3.zarr' where image in (select id from Image where fileset = 5287498);

UPDATE 3456

@will-moore
Copy link
Member

http://localhost:1080/webclient/?show=image-2002281 to check for images...

@will-moore
Copy link
Member

Oh dear - trying to view that image today:

    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/14/09-46-41.391_mkngff/02dc024f-2228-4d19-923e-0d55c1224ba3.zarr/.zattrs
}

Due to mounting of s3 bucket. Created issue at #671

@will-moore
Copy link
Member

Since we cancelled on last Fileset above, rerunning that one on idr0125-pilot...

$ omero mkngff sql --secret=22c41bb8-36e5-4386-9825-179b180d8238 21652 "/bia-integrator-data/S-BIAD855/f3af2f1f-2952-4aee-a6db-0e2cd43585f7/f3af2f1f-2952-4aee-a6db-0e2cd43585f7.zarr" > idr0036/21652.sql

Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/2016-07/04 // 11-50-41.305 for fileset 21652
...

@will-moore
Copy link
Member

Let's start from scratch again on idr-testing... using idr0036.csv from above,,,

$ for r in $(cat $IDRID.csv); do
>   biapath=$(echo $r | cut -d',' -f2)
>   uuid=$(echo $biapath | cut -d'/' -f2)
>   fsid=$(echo $r | cut -d',' -f3)
>   omero mkngff sql $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
> done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-06/14/09-46-41.391 for fileset: 21190

@will-moore
Copy link
Member

Done

bash-4.2$ ls -alh idr0036
total 100M
drwxr-xr-x.  2 omero-server omero-server 4.0K Sep 23 00:39 .
drwxr-xr-x. 22 omero-server root         4.0K Sep 25 04:53 ..
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 22:43 20253.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 21:45 21005.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 23:52 21181.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 21:21 21183.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 23 00:28 21184.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 22:09 21186.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 23:29 21187.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 22:31 21189.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 21:10 21190.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 23:18 21192.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 22:20 21193.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 23:06 21195.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 23 00:16 21196.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 23 00:39 21198.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 21:32 21199.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 22:54 21201.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 23 00:04 21202.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 21:56 21204.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 22 23:40 21205.sql
-rw-r--r--.  1 omero-server omero-server 5.0M Sep 23 00:51 21652.sql

@will-moore
Copy link
Member

will-moore commented Sep 27, 2023

Checking for .zarray found 1 file with none...

(base) LS30778:idr0036 wmoore$ for i in $(ls ./); do echo "$i $(grep -c 'zarray' $i)"; done;
20253.sql 13824
21005.sql 13824
21181.sql 13824
21183.sql 13824
21184.sql 13824
21186.sql 13824
21187.sql 13824
21189.sql 13824
21190.sql 13824
21192.sql 13824
21193.sql 13824
21195.sql 13824
21196.sql 13824
21198.sql 13824
21199.sql 13824
21201.sql 13824
21202.sql 13824
21204.sql 13824
21205.sql 13824
21652.sql 0

re-ran that one on idr0125-pilot as wmoore user...

(venv3) (base) [wmoore@pilot-idr0125-omeroreadwrite ~]$ omero mkngff sql 21652 "/bia-integrator-data/S-BIAD855/f3af2f1f-2952-4aee-a6db-0e2cd43585f7/f3af2f1f-2952-4aee-a6db-0e2cd43585f7.zarr" > "21652.sql"

Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Found prefix: demo_2/2016-07/04/11-50-41.305 for fileset: 21652

@will-moore
Copy link
Member

Fixed:

(base) LS30778:idr0036 wmoore$ for i in $(ls ./); do echo "$i $(grep -c 'zarray' $i)"; done;
20253.sql 13824
21005.sql 13824
21181.sql 13824
21183.sql 13824
21184.sql 13824
21186.sql 13824
21187.sql 13824
21189.sql 13824
21190.sql 13824
21192.sql 13824
21193.sql 13824
21195.sql 13824
21196.sql 13824
21198.sql 13824
21199.sql 13824
21201.sql 13824
21202.sql 13824
21204.sql 13824
21205.sql 13824
21652.sql 13824

@will-moore will-moore moved this from check_pixels to check_pixels in progress in NGFF conversion Dec 4, 2023
@will-moore will-moore moved this from check_pixels in progress to pixels validated in NGFF conversion Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: NGFF studies
Development

No branches or pull requests

5 participants