Skip to content

Integration with MINT

Deborah Khider edited this page Sep 27, 2021 · 13 revisions

Context

We need to register the outputs generated by the Data Transformation and Model Execution. This is going to be a workflow

Data Download (Data/update) -> Data registration (update registration) -> Data Transformation -> Data Registration -> Model Execution -> Data Registration

User Stories

  • The ensemble manager (who does the execution) can call the data registration script. The ensemble is going to pass a list of path to the files and dataset_id.
  • The files are published as resources on the Data Catalog
$ python <the_script> --dataset-id <id> [file(s)] [geospatial] [temporal] [variables] [variable IDs]

The files are local paths and not URLs. (yet) The script can read files

Scope

Define what will be done and what will not be done as part of this project.

  • Extraction of geospatial and temporal information + variables for NetCDF files

In this case the parameters for geospatial, temporal and variable will be contained in the NetCDF file.

  • Registration for other formats will be without the previous extraction
  • Ensemble manager will need to pass the necessary information (geo/temporal) to the registration script.
  • Model catalog will need to pass information about the variables to the registration script.

Requirements

  • Extraction of geospatial and temporal information for NetCDF files - Done
  • If the dataset doesn't exist, creation of the dataset and Extraction of svo for NetCDF files.
  • Register the variables in the data catalog of the variables do not already exist.
  • Registration of the resources.
  • For Topoflow, one folder output is a dataset with each file represented as a resource.
  • The dataset id is a parameter. The Data Registration should not generate it
Clone this wiki locally