Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into plate_images_by_name
Browse files Browse the repository at this point in the history
  • Loading branch information
will-moore committed Jan 4, 2023
2 parents 4e5e8b9 + 298f026 commit 90a66d3
Show file tree
Hide file tree
Showing 8 changed files with 496 additions and 50 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.10.1.dev0
current_version = 0.11.2.dev0
commit = True
tag = True
sign_tags = True
Expand Down
12 changes: 10 additions & 2 deletions CHANGES.rst → CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
CHANGES
=======
0.11.1
------

* Reduce logging level of post_process statement ([#78](https://github.com/ome/omero-metadata/pull/78))

0.11.0
------

* Add support for column type auto-detection using pandas ([#67](https://github.com/ome/omero-metadata/pull/67), [#71](https://github.com/ome/omero-metadata/pull/67), [#72](https://github.com/ome/omero-metadata/pull/72), [#75](https://github.com/ome/omero-metadata/pull/75), [#77](https://github.com/ome/omero-metadata/pull/77))
* Skip empty rows when reading CSV files ([#70](https://github.com/ome/omero-metadata/pull/70))

0.10.0
------
Expand Down
126 changes: 105 additions & 21 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ conflicts when importing the Python module.
Usage
=====

The plugin is called from the command-line using the `omero` command::
The plugin is called from the command-line using the ``omero metadata`` command::

$ omero metadata <subcommand>

Expand Down Expand Up @@ -64,45 +64,93 @@ populate
--------

This command creates an ``OMERO.table`` (bulk annotation) from a ``CSV`` file and links
the table as a ``File Annotation`` to a parent container such as Screen, Plate, Project
the table as a ``File Annotation`` to a parent container such as Screen, Plate, Project,
Dataset or Image. It also attempts to convert Image, Well or ROI names from the ``CSV`` into
object IDs in the ``OMERO.table``.

The ``CSV`` file must be provided as local file with ``--file path/to/file.csv``.

If you wish to ensure that ``number`` columns are created for numerical data, this will
allow you to make numerical queries on the table.
Column Types are:
OMERO.tables have defined column types to specify the data-type such as ``double`` or ``long`` and special object-types of each column for storing OMERO object IDs such as ``ImageColumn`` or ``WellColumn``.

The default behaviour of the script is to automatically detect the column types from an input ``CSV``. This behaviour works as follows:

* Columns named with a supported object-type (e.g. ``plate``, ``well``, ``image``, ``dataset``, or ``roi``), with ``<object> id`` or ``<object> name`` will generate the corresponding column type in the OMERO.table. See table below for full list of supported column names.

============ ================= ==================== ====================================================================
Column Name Column type Detected Header Type Notes
============ ================= ==================== ====================================================================
Image ``ImageColumn`` ``image`` Accepts image IDs. Appends new 'Image Name' column with image names.
Image Name ``StringColumn`` ``s`` Accepts image names. Appends new 'Image' column with image IDs.
Image ID ``ImageColumn`` ``image`` Accepts image IDs. Appends new 'Image Name' column with image names.
Dataset ``DatasetColumn`` ``dataset`` Accepts dataset IDs.
Dataset Name ``StringColumn`` ``s`` Accepts dataset names.
Dataset ID ``DatasetColumn`` ``dataset`` Accepts dataset IDs.
Plate ``PlateColumn`` ``plate`` Accepts plate names. Adds new 'Plate' column with plate IDs.
Plate Name ``PlateColumn`` ``plate`` Accepts plate names. Adds new 'Plate' column with plate IDs.
Plate ID ``LongColumn`` ``l`` Accepts plate IDs.
Well ``WellColumn`` ``well`` Accepts well names. Adds new 'Well' column with well IDs.
Well Name ``WellColumn`` ``well`` Accepts well names. Adds new 'Well' column with well IDs.
Well ID ``LongColumn`` ``l`` Accepts well IDs.
ROI ``RoiColumn`` ``roi`` Accepts ROI IDs. Appends new 'ROI Name' column with ROI names.
ROI Name ``StringColumn`` ``s`` Accepts ROI names. Appends new 'ROI' column with ROI IDs.
ROI ID ``RoiColumn`` ``roi`` Accepts ROI IDs. Appends new 'ROI Name' column with ROI names.
============ ================= ==================== ====================================================================

Note: Column names are case insensitive. Space, no space, and underscore are all accepted as separators for column names (i.e. ``<object> name``/``<object> id```, ``<object>name``/``<object>id``, ``<object>_name``/``<object>_id`` are all accepted)

NB: Column names should not contain spaces if you want to be able to query by these columns.

* All other column types will be detected based on the column's data using the pandas library. See table below.

=============== ================= ====================
Column Name Column type Detected Header Type
=============== ================= ====================
Example String ``StringColumn`` ``s``
Example Long ``LongColumn`` ``l``
Example Float ``DoubleColumn`` ``d``
Example boolean ``BoolColumn`` ``b``
=============== ================= ====================

In the case of missing values, the column will be detected as ``StringColumn`` by default. If ``--allow-nan`` is passed to the
``omero metadata populate`` commands, missing values in floating-point columns will be detected as ``DoubleColumn`` and the
missing values will be stored as NaN.

However, it is possible to manually define the header types, ignoring the automatic header detection, if a ``CSV`` with a ``# header`` row is passed. The ``# header`` row should be the first row of the CSV and defines columns according to the following list (see examples below):

- ``d``: ``DoubleColumn``, for floating point numbers
- ``l``: ``LongColumn``, for integer numbers
- ``s``: ``StringColumn``, for text
- ``b``: ``BoolColumn``, for true/false
- ``plate``, ``well``, ``image``, ``dataset``, ``roi`` to specify objects

These can be specified in the first row of a ``CSV`` with a ``# header`` tag (see examples below).
The ``# header`` row is optional. Default column type is ``String``.
Automatic header detection can also be ignored if using the ``--manual_headers`` flag. If the ``# header`` is not present and this flag is used, column types will default to ``String`` (unless the column names correspond to OMERO objects such as ``image`` or ``plate``).

NB: Column names should not contain spaces if you want to be able to query
by these columns.

Examples
^^^^^^^^^

The examples below will use the default automatic column types detection behaviour. It is possible to achieve the same results (or a different desired result) by manually adding a custom ``# header`` row at the top of the CSV.

**Project / Dataset**
^^^^^^^^^^^^^^^^^^^^^^

To add a table to a Project, the ``CSV`` file needs to specify ``Dataset Name``
To add a table to a Project, the ``CSV`` file needs to specify ``Dataset Name`` or ``Dataset ID``
and ``Image Name`` or ``Image ID``::

$ omero metadata populate Project:1 --file path/to/project.csv
Using ``Image Name`` and ``Dataset Name``:

project.csv::

# header s,s,d,l,s
Image Name,Dataset Name,ROI_Area,Channel_Index,Channel_Name
img-01.png,dataset01,0.0469,1,DAPI
img-02.png,dataset01,0.142,2,GFP
img-03.png,dataset01,0.093,3,TRITC
img-04.png,dataset01,0.429,4,Cy5

This will create an OMERO.table linked to the Project like this with
The previous example will create an OMERO.table linked to the Project as follows with
a new ``Image`` column with IDs:

========== ============ ======== ============= ============ =====
Expand All @@ -114,23 +162,52 @@ img-03.png dataset01 0.093 3 TRITC 36640
img-04.png dataset01 0.429 4 Cy5 36641
========== ============ ======== ============= ============ =====

If the target is a Dataset instead of a Project, the ``Dataset Name`` column is not needed.
Note: equivalent to adding ``# header s,s,d,l,s`` row to the top of the ``project.csv`` for manual definition.

Using ``Image ID`` and ``Dataset ID``:

project.csv::

image id,Dataset ID,ROI_Area,Channel_Index,Channel_Name
36638,101,0.0469,1,DAPI
36639,101,0.142,2,GFP
36640,101,0.093,3,TRITC
36641,101,0.429,4,Cy5


The previous example will create an OMERO.table linked to the Project as follows with
a new ``Image Name`` column with Names:

===== ======= ======== ============= ============ ==========
Image Dataset ROI_Area Channel_Index Channel_Name Image Name
===== ======= ======== ============= ============ ==========
36638 101 0.0469 1 DAPI img-01.png
36639 101 0.142 2 GFP img-02.png
36640 101 0.093 3 TRITC img-03.png
36641 101 0.429 4 Cy5 img-04.png
===== ======= ======== ============= ============ ==========

Note: equivalent to adding ``# header image,dataset,d,l,s`` row to the top of the ``project.csv`` for manual definition.

For both examples above, alternatively, if the target is a Dataset instead of a Project, the ``Dataset`` or ``Dataset Name`` column is not needed.

**Screen / Plate**
^^^^^^^^^^^^^^^^^^^

To add a table to a Screen, the ``CSV`` file needs to specify ``Plate`` name and ``Well``.
If a ``# header`` is specified, column types must be ``well`` and ``plate``.
If a ``# header`` is specified, column types must be ``well`` and ``plate``::

$ omero metadata populate Screen:1 --file path/to/screen.csv

screen.csv::

# header well,plate,s,d,l,d
Well,Plate,Drug,Concentration,Cell_Count,Percent_Mitotic
A1,plate01,DMSO,10.1,10,25.4
A2,plate01,DMSO,0.1,1000,2.54
A3,plate01,DMSO,5.5,550,4
B1,plate01,DrugX,12.3,50,44.43


This will create an OMERO.table linked to the Screen, with the
``Well Name`` and ``Plate Name`` columns added and the ``Well`` and
``Plate`` columns used for IDs:
Expand All @@ -146,29 +223,30 @@ Well Plate Drug Concentration Cell_Count Percent_Mitotic Well Name Plat

If the target is a Plate instead of a Screen, the ``Plate`` column is not needed.

Note: equivalent to adding ``# header well,plate,s,d,l,d`` row to the top of the ``screen.csv`` for manual definition.

**ROIs**
^^^^^^^^^

If the target is an Image or a Dataset, a ``CSV`` with ROI-level or Shape-level data can be used to create an
``OMERO.table`` (bulk annotation) as a ``File Annotation`` linked to the target object.
If there is an ``roi`` column (header type ``roi``) containing ROI IDs, an ``Roi Name``
column will be appended automatically (see example below). If a column of Shape IDs named ``shape``
of type ``l`` is included, the Shape IDs will be validated (and set to -1 if invalid).
Also if an ``image`` column of Image IDs is included, an ``Image Name`` column will be added.
NB: Columns of type ``shape`` aren't yet supported on the OMERO.server.
NB: Columns of type ``shape`` aren't yet supported on the OMERO.server::

Alternatively, if the target is an Image, the ROI input column can be
``Roi Name`` (with type ``s``), and an ``roi`` type column will be appended containing ROI IDs.
In this case, it is required that ROIs on the Image in OMERO have the ``Name`` attribute set.
$ omero metadata populate Image:1 --file path/to/image.csv

image.csv::

# header roi,l,l,d,l
Roi,shape,object,probability,area
501,1066,1,0.8,250
502,1067,2,0.9,500
503,1068,3,0.2,25
503,1069,4,0.8,400
503,1070,5,0.5,200

This will create an OMERO.table linked to the Image like this:

Expand All @@ -182,6 +260,12 @@ Roi shape object probability area Roi Name
503 1070 5 0.5 200 Sample3
=== ===== ====== =========== ==== ========

Note: equivalent to adding ``# header roi,l,l,d,l`` row to the top of the ``image.csv`` for manual definition.

Alternatively, if the target is an Image, the ROI input column can be
``Roi Name`` (with type ``s``), and an ``roi`` type column will be appended containing ROI IDs.
In this case, it is required that ROIs on the Image in OMERO have the ``Name`` attribute set.

Note that the ROI-level data from an ``OMERO.table`` is not visible
in the OMERO.web UI right-hand panel under the ``Tables`` tab,
but the table can be visualized by clicking the "eye" on the bulk annotation attachment on the Image.
Expand All @@ -204,4 +288,4 @@ licensed under the terms of the GNU General Public License (GPL) v2 or later.
Copyright
---------

2018-2021, The Open Microscopy Environment
2018-2022, The Open Microscopy Environment and Glencoe Software, Inc
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()


version = '0.10.1.dev0'
version = '0.11.2.dev0'
url = "https://github.com/ome/omero-metadata/"

setup(
Expand Down Expand Up @@ -127,7 +127,8 @@ def read(fname):
'future',
'omero-py>=5.6.0',
'PyYAML',
'jinja2'
'jinja2',
'pandas'
],
python_requires='>=3',
tests_require=[
Expand Down
70 changes: 67 additions & 3 deletions src/omero_metadata/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
from omero.grid import LongColumn
from omero.model.enums import UnitsLength

import pandas as pd

HELP = """Metadata utilities
Provides access to and editing of the metadata which
Expand Down Expand Up @@ -239,8 +241,13 @@ def _configure(self, parser):
populate.add_argument("--localcfg", help=(
"Local configuration file or a JSON object string"))

populate.add_argument("--allow_nan", action="store_true", help=(
"Allow empty values to become Nan in Long or Double columns"))
populate.add_argument(
"--allow-nan", "--allow_nan", action="store_true", help=(
"Allow empty values to become Nan in Long or Double columns"))

populate.add_argument(
"--manual-header", "--manual_header", action="store_true", help=(
"Disable automatic header detection during population"))

populateroi.add_argument(
"--measurement", type=int, default=None,
Expand Down Expand Up @@ -483,6 +490,49 @@ def testtables(self, args):
if not initialized:
self.ctx.die(100, "Failed to initialize Table")

@staticmethod
def detect_headers(csv_path, keep_default_na=True):
'''
Function to automatically detect headers from a CSV file. This function
loads the table to pandas to detects the column type and match headers
'''

conserved_headers = ['well', 'plate', 'image', 'dataset', 'roi']
headers = []
table = pd.read_csv(csv_path, keep_default_na=keep_default_na)
col_types = table.dtypes.values.tolist()
cols = list(table.columns)

for index, col_type in enumerate(col_types):
col = cols[index]
if col.lower() in conserved_headers:
headers.append(col.lower())
elif col.lower() == 'image id' or col.lower() == 'imageid' or \
col.lower() == 'image_id':
headers.append('image')
elif col.lower() == 'roi id' or col.lower() == 'roiid' or \
col.lower() == 'roi_id':
headers.append('roi')
elif col.lower() == 'dataset id' or \
col.lower() == 'datasetid' or \
col.lower() == 'dataset_id':
headers.append('dataset')
elif col.lower() == 'plate name' or col.lower() == 'platename' or \
col.lower() == 'plate_name':
headers.append('plate')
elif col.lower() == 'well name' or col.lower() == 'wellname' or \
col.lower() == 'well_name':
headers.append('well')
elif col_type.name == 'object':
headers.append('s')
elif col_type.name == 'float64':
headers.append('d')
elif col_type.name == 'int64':
headers.append('l')
elif col_type.name == 'bool':
headers.append('b')
return headers

# WRITE

def populate(self, args):
Expand Down Expand Up @@ -521,6 +571,20 @@ def populate(self, args):
cfgid = cfgann.getFile().getId()
md.linkAnnotation(cfgann)

header_type = None
# To use auto detect header by default unless instructed not to
# AND
# Check if first row contains `# header`
first_row = pd.read_csv(args.file, nrows=1, header=None)
if not args.manual_header and \
not first_row[0].str.contains('# header').bool():
omero_metadata.populate.log.info("Detecting header types")
header_type = MetadataControl.detect_headers(
args.file, keep_default_na=args.allow_nan)
if args.dry_run:
omero_metadata.populate.log.info(f"Header Types:{header_type}")
else:
omero_metadata.populate.log.info("Using user defined header types")
loops = 0
ms = 0
wait = args.wait
Expand All @@ -533,7 +597,7 @@ def populate(self, args):
cfg=args.cfg, cfgid=cfgid, attach=args.attach,
options=localcfg, batch_size=args.batch,
loops=loops, ms=ms, dry_run=args.dry_run,
allow_nan=args.allow_nan)
allow_nan=args.allow_nan, column_types=header_type)
ctx.parse()

def rois(self, args):
Expand Down
Loading

0 comments on commit 90a66d3

Please sign in to comment.