From bef57bc4d579c437f30fffc38aee8c1f7bef3d9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Caterina=20Fuster-Barcel=C3=B3?= Date: Wed, 29 May 2024 13:38:40 +0200 Subject: [PATCH] Add model resource description file specs Taken from here: https://github.com/bioimage-io/spec-bioimage-io/blob/644e9c2300a5fc3ac5ec1effc3495cf5e9b98d9b/model_spec_latest.md --- docs/guides/README.md | 166 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 165 insertions(+), 1 deletion(-) diff --git a/docs/guides/README.md b/docs/guides/README.md index b6875e88..dfc86d6c 100644 --- a/docs/guides/README.md +++ b/docs/guides/README.md @@ -194,6 +194,170 @@ When following the BioImage.IO model RDF specification provided at https://githu * Pre-processing and post-processing should be always described. You can check which [preprocessing](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/preprocessing_spec_latest.md) and [postprocessing](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/postprocessing_spec_latest.md) functions are supported at the moment and open an [issue here](https://github.com/bioimage-io/spec-bioimage-io/issues) if you are missing a specific operation. * Do not forget to include any additional files needed for the correct execution of the model during the upload process. +#### Model Resource Description File Specification (0.4.9) +This specification defines the fields used in a BioImage.IO-compliant resource description file (`RDF`) for describing AI models with pretrained weights. +These fields are typically stored in YAML files which we call Model Resource Description Files or `model RDF`. +The model RDFs can be downloaded or uploaded to the bioimage.io website, produced or consumed by BioImage.IO-compatible consumers(e.g. image analysis software or other website). + +The model RDF YAML file contains mandatory and optional fields. In the following description, optional fields are indicated by _optional_. +_optional*_ with an asterisk indicates the field is optional depending on the value in another field. + +* `format_version` _(required String)_ Version of the BioImage.IO Model Resource Description File Specification used. + This is mandatory, and important for the consumer software to verify before parsing the fields. + The recommended behavior for the implementation is to keep backward compatibility and throw an error if the model yaml + is in an unsupported format version. The current format version described here is + 0.4.9 +* `authors` _(required List\[Author\])_ A list of authors. The authors are the creators of the specifications and the primary points of contact. + 1. _(Author)_ is a Dict with the following keys: + * `name` _(Name→String)_ Full name. + * `affiliation` _(String)_ Affiliation. + * `email` _(Email)_ E-Mail + * `github_user` _(String)_ GitHub user name. + * `orcid` _(String)_ [orcid](https://support.orcid.org/hc/en-us/sections/360001495313-What-is-ORCID) id in hyphenated groups of 4 digits, e.g. '0000-0001-2345-6789' (and [valid](https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier) as per ISO 7064 11,2.) +* `description` _(required String)_ A string containing a brief description. +* `documentation` _(required Union\[URL→URI | Path→String\])_ Relative path or URL to file with additional documentation in markdown. The file must be in markdown format with `.md` file name extension. It is recommended to use `README.md` as the documentation name. The documentation should include a (sub)section '[#[#]]# Validation' with details on how to quantitatively validate the model on unseen data. +* `inputs` _(required List\[InputTensor\])_ Describes the input tensors expected by this model. + 1. _(InputTensor)_ is a Dict with the following keys: + * `axes` _(Axes→String)_ Axes identifying characters from: bitczyx. Same length and order as the axes in `shape`. + + | character | description | + | --- | --- | + | b | batch (groups multiple samples) | + | i | instance/index/element | + | t | time | + | c | channel | + | z | spatial dimension z | + | y | spatial dimension y | + | x | spatial dimension x | + * `data_type` _(String)_ The data type of this tensor. For inputs, only `float32` is allowed and the consumer software needs to ensure that the correct data type is passed here. For outputs can be any of `float32, float64, (u)int8, (u)int16, (u)int32, (u)int64`. The data flow in bioimage.io models is explained [in this diagram.](https://docs.google.com/drawings/d/1FTw8-Rn6a6nXdkZ_SkMumtcjvur9mtIhRqLwnKqZNHM/edit). + * `name` _(String)_ Tensor name. No duplicates are allowed. + * `shape` _(Union\[ExplicitShape→List\[Integer\] | ParametrizedInputShape\])_ Specification of input tensor shape. + 1. _(ExplicitShape→List\[Integer\])_ Exact shape with same length as `axes`, e.g. `shape: [1, 512, 512, 1]` + 1. _(ParametrizedInputShape)_ A sequence of valid shapes given by `shape = min + k * step for k in {0, 1, ...}`. ParametrizedInputShape is a Dict with the following keys: + * `min` _(List\[Integer\])_ The minimum input shape with same length as `axes` + * `step` _(List\[Integer\])_ The minimum shape change with same length as `axes` + * `data_range` _(Tuple)_ Tuple `(minimum, maximum)` specifying the allowed range of the data in this tensor. If not specified, the full data range that can be expressed in `data_type` is allowed. + * `preprocessing` _(List\[Preprocessing\])_ Description of how this input should be preprocessed. + 1. _(Preprocessing)_ is a Dict with the following keys: + * `name` _(String)_ Name of preprocessing. One of: binarize, clip, scale_linear, sigmoid, zero_mean_unit_variance, scale_range. + * `kwargs` _(Kwargs→Dict\[String, Any\])_ Key word arguments as described in [preprocessing spec](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/preprocessing_spec_0_4.md). +* `license` _(required String)_ A [SPDX license identifier](https://spdx.org/licenses/)(e.g. `CC-BY-4.0`, `MIT`, `BSD-2-Clause`). We don't support custom license beyond the SPDX license list, if you need that please send an Github issue to discuss your intentions with the community. +* `name` _(required Name→String)_ Name of this model. It should be human-readable and only contain letters, numbers, underscore '_', minus '-' or spaces and not be longer than 64 characters. +* `test_inputs` _(required List\[Union\[URI→String | Path→String\]\])_ List of URIs or local relative paths to test inputs as described in inputs for **a single test case**. This means if your model has more than one input, you should provide one URI for each input.Each test input should be a file with a ndarray in [numpy.lib file format](https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format).The extension must be '.npy'. +* `test_outputs` _(required List\[Union\[URI→String | Path→String\]\])_ Analog to test_inputs. +* `timestamp` _(required DateTime)_ Timestamp of the initial creation of this model in [ISO 8601](#https://en.wikipedia.org/wiki/ISO_8601) format. +* `weights` _(required Dict\[String, Union\[KerasHdf5WeightsEntry | OnnxWeightsEntry | PytorchStateDictWeightsEntry | TensorflowJsWeightsEntry | TensorflowSavedModelBundleWeightsEntry | TorchscriptWeightsEntry\]\])_ + 1. _(String)_ Format of this set of weights. One of: pytorch_state_dict, torchscript, keras_hdf5, tensorflow_js, tensorflow_saved_model_bundle, onnx + 1. _(Union\[KerasHdf5WeightsEntry | OnnxWeightsEntry | PytorchStateDictWeightsEntry | TensorflowJsWeightsEntry | TensorflowSavedModelBundleWeightsEntry | TorchscriptWeightsEntry\])_ The weights for this model. Weights can be given for different formats, but should otherwise be equivalent. See [weight_formats_spec_0_4.md](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/weight_formats_spec_0_4.md) for the required and optional fields per weight format. The available weight formats determine which consumers can use this model. +* `attachments` _(optional Attachments)_ Additional unknown keys are allowed. Attachments is a Dict with the following keys: + * `files` _(optional List\[Union\[URI→String | Path→String\]\])_ File attachments; included when packaging the resource. +* `badges` _(optional List\[Badge\])_ a list of badges + 1. _(Badge)_ Custom badge. Badge is a Dict with the following keys: + * `label` _(String)_ e.g. 'Open in Colab' + * `icon` _(String)_ e.g. 'https://colab.research.google.com/assets/colab-badge.svg' + * `url` _(Union\[URL→URI | Path→String\])_ e.g. 'https://colab.research.google.com/github/HenriquesLab/ZeroCostDL4Mic/blob/master/Colab_notebooks/U-net_2D_ZeroCostDL4Mic.ipynb' +* `cite` _(optional List\[CiteEntry\])_ A list of citation entries. + Each entry contains a mandatory `text` field and either one or both of `doi` and `url`. + E.g. the citation for the model architecture and/or the training data used. + 1. _(CiteEntry)_ is a Dict with the following keys: + * `text` _(String)_ free text description + * `doi` _(DOI→String)_ digital object identifier, see https://www.doi.org/ (alternatively specify `url`) + * `url` _(String)_ url to cite (alternatively specify `doi`) +* `config` _(optional YamlDict→Dict\[Any, Any\])_ A custom configuration field that can contain any keys not present in the RDF spec. This means you should not store, for example, github repo URL in `config` since we already have the `git_repo` key defined in the spec. + Keys in `config` may be very specific to a tool or consumer software. To avoid conflicted definitions, it is recommended to wrap configuration into a sub-field named with the specific domain or tool name, for example: + + ```yaml + config: + bioimage_io: # here is the domain name + my_custom_key: 3837283 + another_key: + nested: value + imagej: + macro_dir: /path/to/macro/file + ``` + If possible, please use [`snake_case`](https://en.wikipedia.org/wiki/Snake_case) for keys in `config`. + For example: + ```yaml + config: + # custom config for DeepImageJ, see https://github.com/bioimage-io/configuration/issues/23 + deepimagej: + model_keys: + # In principle the tag "SERVING" is used in almost every tf model + model_tag: tf.saved_model.tag_constants.SERVING + # Signature definition to call the model. Again "SERVING" is the most general + signature_definition: tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY + test_information: + input_size: [2048x2048] # Size of the input images + output_size: [1264x1264 ]# Size of all the outputs + device: cpu # Device used. In principle either cpu or GPU + memory_peak: 257.7 Mb # Maximum memory consumed by the model in the device + runtime: 78.8s # Time it took to run the model + pixel_size: [9.658E-4µmx9.658E-4µm] # Size of the pixels of the input + ``` +* `covers` _(optional List\[Union\[URL→URI | Path→String\]\])_ A list of cover images provided by either a relative path to the model folder, or a hyperlink starting with 'http[s]'. Please use an image smaller than 500KB and an aspect ratio width to height of 2:1. The supported image formats are: 'jpg', 'png', 'gif'. +* `download_url` _(optional Union\[URL→URI | Path→String\])_ optional url to download the resource from +* `git_repo` _(optional URL→URI)_ A url to the git repository, e.g. to Github or Gitlab.If the model is contained in a subfolder of a git repository, then a url to the exact folder(which contains the configuration yaml file) should be used. +* `icon` _(optional String)_ an icon for the resource +* `id` _(optional String)_ Unique id within a collection of resources. +* `links` _(optional List\[String\])_ links to other bioimage.io resources +* `maintainers` _(optional List\[Maintainer\])_ Maintainers of this resource. + 1. _(Maintainer)_ is a Dict with the following keys: + * `github_user` _(String)_ GitHub user name. + * `affiliation` _(String)_ Affiliation. + * `email` _(Email)_ E-Mail + * `name` _(Name→String)_ Full name. + * `orcid` _(String)_ [orcid](https://support.orcid.org/hc/en-us/sections/360001495313-What-is-ORCID) id in hyphenated groups of 4 digits, e.g. '0000-0001-2345-6789' (and [valid](https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier) as per ISO 7064 11,2.) +* `outputs` _(optional List\[OutputTensor\])_ Describes the output tensors from this model. + 1. _(OutputTensor)_ is a Dict with the following keys: + * `axes` _(Axes→String)_ Axes identifying characters from: bitczyx. Same length and order as the axes in `shape`. + + | character | description | + | --- | --- | + | b | batch (groups multiple samples) | + | i | instance/index/element | + | t | time | + | c | channel | + | z | spatial dimension z | + | y | spatial dimension y | + | x | spatial dimension x | + * `data_type` _(String)_ The data type of this tensor. For inputs, only `float32` is allowed and the consumer software needs to ensure that the correct data type is passed here. For outputs can be any of `float32, float64, (u)int8, (u)int16, (u)int32, (u)int64`. The data flow in bioimage.io models is explained [in this diagram.](https://docs.google.com/drawings/d/1FTw8-Rn6a6nXdkZ_SkMumtcjvur9mtIhRqLwnKqZNHM/edit). + * `name` _(String)_ Tensor name. No duplicates are allowed. + * `shape` _(Union\[ExplicitShape→List\[Integer\] | ImplicitOutputShape\])_ Specification of output tensor shape. + 1. _(ImplicitOutputShape)_ In reference to the shape of an input tensor, the shape of the output tensor is `shape = shape(input_tensor) * scale + 2 * offset`. ImplicitOutputShape is a Dict with the following keys: + * `offset` _(List\[Float\])_ Position of origin wrt to input. Multiple of 0.5. + * `reference_tensor` _(String)_ Name of the reference tensor. + * `scale` _(List\[Float\])_ 'output_pix/input_pix' for each dimension. + * `data_range` _(Tuple)_ Tuple `(minimum, maximum)` specifying the allowed range of the data in this tensor. If not specified, the full data range that can be expressed in `data_type` is allowed. + * `halo` _(List\[Integer\])_ The halo to crop from the output tensor (for example to crop away boundary effects or for tiling). The halo should be cropped from both sides, i.e. `shape_after_crop = shape - 2 * halo`. The `halo` is not cropped by the bioimage.io model, but is left to be cropped by the consumer software. Use `shape:offset` if the model output itself is cropped and input and output shapes not fixed. + * `postprocessing` _(List\[Postprocessing\])_ Description of how this output should be postprocessed. + 1. _(Postprocessing)_ is a Dict with the following keys: + * `name` _(String)_ Name of postprocessing. One of: binarize, clip, scale_linear, sigmoid, zero_mean_unit_variance, scale_range, scale_mean_variance. + * `kwargs` _(Kwargs→Dict\[String, Any\])_ Key word arguments as described in [postprocessing spec](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/postprocessing_spec_0_4.md). +* `packaged_by` _(optional List\[Author\])_ The persons that have packaged and uploaded this model. Only needs to be specified if different from `authors` in root or any entry in `weights`. + 1. _(Author)_ is a Dict with the following keys: + * `name` _(Name→String)_ Full name. + * `affiliation` _(String)_ Affiliation. + * `email` _(Email)_ E-Mail + * `github_user` _(String)_ GitHub user name. + * `orcid` _(String)_ [orcid](https://support.orcid.org/hc/en-us/sections/360001495313-What-is-ORCID) id in hyphenated groups of 4 digits, e.g. '0000-0001-2345-6789' (and [valid](https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier) as per ISO 7064 11,2.) +* `parent` _(optional ModelParent)_ The model from which this model is derived, e.g. by fine-tuning the weights. ModelParent is a Dict with the following keys: + * `id` _(optional BioImageIO_ID→String)_ ID as shown on resource card on bioimage.io + * `sha256` _(optional SHA256→String)_ Hash of the parent model RDF. Note: the hash is not validated + * `uri` _(optional Union\[URI→String | Path→String\])_ URL or local relative path of a model RDF +* `rdf_source` _(optional Union\[URL→URI | DOI→String\])_ url or doi to the source of the resource definition +* `run_mode` _(optional RunMode)_ Custom run mode for this model: for more complex prediction procedures like test time data augmentation that currently cannot be expressed in the specification. No standard run modes are defined yet. RunMode is a Dict with the following keys: + * `name` _(required String)_ The name of the `run_mode` + * `kwargs` _(optional Kwargs→Dict\[String, Any\])_ Key word arguments. +* `sample_inputs` _(optional List\[Union\[URI→String | Path→String\]\])_ List of URIs/local relative paths to sample inputs to illustrate possible inputs for the model, for example stored as png or tif images. The model is not tested with these sample files that serve to inform a human user about an example use case. +* `sample_outputs` _(optional List\[Union\[URI→String | Path→String\]\])_ List of URIs/local relative paths to sample outputs corresponding to the `sample_inputs`. +* `tags` _(optional List\[String\])_ A list of tags. +* `training_data` _(optional Union\[Dataset | LinkedDataset\])_ + 1. _(optional Dataset)_ in-place definition of [dataset RDF](https://github.com/bioimage-io/spec-bioimage-io/blob/gh-pages/dataset_spec_0_2.md) + 1. _(optional LinkedDataset)_ is a Dict with the following keys: + * `id` _(optional String)_ dataset id +* `version` _(optional Version→String)_ The version number of the model. The version number format must be a string in `MAJOR.MINOR.PATCH` format following the guidelines in Semantic Versioning 2.0.0 (see https://semver.org/), e.g. the initial version number should be `0.1.0`. + + ### Other contributions You are welcome to submit your **models**, **datasest**, **applicaitons** and Jupyter **notebooks** to BioImage.IO. @@ -554,4 +718,4 @@ Once a community partner is registered to contribute test summaries with the `te #### Updating test summaries The main [BioImage.IO CI](https://github.com/bioimage-io/collection-bioimage-io/blob/main/.github/workflows/auto_update_main.yaml) triggers the partner's CI for new or updated resources. -Additionally, the parter may decide at any time to rerun (some of) their tests if changes on their side (like a new version release of their software) requires additional tests or a reevaluation. \ No newline at end of file +Additionally, the parter may decide at any time to rerun (some of) their tests if changes on their side (like a new version release of their software) requires additional tests or a reevaluation.