-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add IO and Statistics User Guide (#11)
* add io user guide * add statistics user guide * fix reference typos * fix earth2studio typos * fix typos and references * add statistic template
- Loading branch information
1 parent
fb517cc
commit 6704868
Showing
7 changed files
with
203 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
:mod:`{{module}}`.{{objname}} | ||
{{ underline }}============== | ||
|
||
.. currentmodule:: {{module}} | ||
|
||
.. autoclass:: {{ objname }} | ||
|
||
{% block methods %} | ||
.. automethod:: __call__ | ||
{% endblock %} | ||
|
||
.. _sphx_glr_backref_{{module}}.{{objname}}: | ||
|
||
.. minigallery:: {{module}}.{{objname}} | ||
:add-heading: | ||
:heading-level: ^ | ||
|
||
.. raw:: html | ||
|
||
<div class="clearer"></div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
(output_handling_userguide)= | ||
|
||
# Output Handling | ||
|
||
While input data handling is primarily managed by the data sources in | ||
{mod}`earth2studio.data`, output handling is managed by the IO backends available | ||
in {mod}`earth2studio.io`. These backends are designed to balance the ability for | ||
users to customize the arrays and metadata within the exposed backend while also | ||
making it easy to design resuable workflows. | ||
|
||
The key extension of the typical `(x, coords)` data structure movement throughout | ||
the rest of the `earth2studio` code and output store compatibility is the notion of | ||
an `array_name`. Names distinguish between different arrays within the backend and | ||
is currently a requirement for storing `Datasets` in `xarray`, `zarr`, and `netcdf`. | ||
This means that the user must supply a name when adding an array to a store or when | ||
writing an array. A frequent pattern is to extract one dimension of an array, | ||
such as `"variable"` to act as individual arrays in the backend, see the examples below. | ||
|
||
## IO Backend Interface | ||
|
||
The full requirements for a standard prognostic model our defined explicitly in the | ||
`earth2studio/io/base.py`. | ||
|
||
```{literalinclude} ../../../earth2studio/io/base.py | ||
:lines: 24- | ||
:language: python | ||
``` | ||
|
||
:::{note} | ||
IO Backends do not need to inherit this protocol, this is simply used to define | ||
the required APIs. Some built-in IO backends also may offer additional functionality | ||
that is not universally supported (and hence not required). | ||
::: | ||
|
||
There are two important methods that must be supported: `add_array`, which | ||
adds an array to the underlying store and any attached coordinates, and `write`, | ||
which explicity stores the passed data in the backend. The `write` command may | ||
induce synchronization if the input tensor resides on the GPU and the store. Most | ||
stores make a conversion from PyTorch to numpy in this process. The | ||
{mod}`earth2studio.io.kv` backend has the option for storing data on the GPU, which can be | ||
done asynchronously. | ||
|
||
Most data stores offer a number of additional utilities such as `__contains__`, | ||
`__getitem__`, `__len__`, and `__iter__`. For examples, see the implementation in | ||
{mod}`earth2studio.io.ZarrBackend`: | ||
|
||
```{literalinclude} ../../../earth2studio/io/zarr.py | ||
:lines: 53-81 | ||
:language: python | ||
``` | ||
|
||
Because of `datetime` compatibility, we recommend using the `ZarrBackend` as a default. | ||
|
||
## Initializing a Store | ||
|
||
A common data pattern seen throughout our example workflows is to initialize the | ||
variables and dimensions of a backend using a complete `CoordSystem`. For example: | ||
|
||
```python | ||
# Build a complete CoordSystem | ||
total_coords = OrderedDict( | ||
dict( | ||
'ensemble': ..., | ||
'time': ..., | ||
'lead_time': ..., | ||
'variable': ..., | ||
'lat': ..., | ||
'lon': ... | ||
) | ||
) | ||
|
||
# Give an informative array name | ||
array_name = 'fields' | ||
|
||
# Initialize all dimensions in total_coords and the array 'fields' | ||
io.add_array(total_coords, 'fields') | ||
``` | ||
|
||
It can be tedious to define each coordinate and dimension, luckily if we have | ||
a prognostic or diagnostic model, most of this information is already available. | ||
Here is a robust example of such a use-case: | ||
|
||
```python | ||
# Set up IO backend | ||
# assume we have `prognostic model`, `time` and `array_name` | ||
# Copy prognostic model output coordinates | ||
total_coords = OrderedDict( | ||
{ | ||
k: v for k, v in prognostic.output_coords.items() if | ||
(k != "batch") and (v.shape != 0) | ||
} | ||
) | ||
total_coords["time"] = time | ||
total_coords["lead_time"] = np.asarray( | ||
[prognostic.output_coords["lead_time"] * i for i in range(nsteps + 1)] | ||
).flatten() | ||
total_coords.move_to_end("lead_time", last=False) | ||
total_coords.move_to_end("time", last=False) | ||
io.add_array(total_coords, array_name) | ||
``` | ||
|
||
A common use-case is to extract a particular dimension (usually `variable`) as | ||
the array names. | ||
|
||
```python | ||
# A modification of the previous example: | ||
var_names = total_coords.pop("variable") | ||
io.add_array(total_coords, var_names) | ||
``` | ||
|
||
## Writing to the store | ||
|
||
Once the data arrays have been initialized in the backend, writing to those arrays | ||
is a single line of code. | ||
|
||
```python | ||
x, coords = model(x, coords) | ||
io.write(x, coords, array_name) | ||
``` | ||
|
||
If, as above, the user is extracting a dimension of the tensor to use as array names | ||
then they can make use of {mod}`earth2studio.utils.coords.extract_coords`: | ||
|
||
```python | ||
io.write(*extract_coords(x, coords, dim = "variable")) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
(statistics_model_userguide)= | ||
|
||
# Statistics | ||
|
||
Statistics are distinct from prognostic and diagnostic models in principle because | ||
we assume that statistics reduce existing coordinates so that the output tensors | ||
have a coordinate system that is a subset of the input coordinate system. This | ||
makes statistics less flexible than diagnostic models but have fewer API requirements. | ||
|
||
## Statistics Interface | ||
|
||
Statistics API only specify a {func}`__call__` method that matches similar methods | ||
across the package. | ||
|
||
```{literalinclude} ../../../earth2studio/statistics/base.py | ||
:lines: 24-43 | ||
:language: python | ||
``` | ||
|
||
The base API hints at, and inspection of the {mod}`earth2studio.statistics.moments` | ||
examples, the use of a few properties to make statistic handling easier: | ||
`reduction_dimensions`, which are a list of dimensions that will be reduced over, | ||
`weights`, which must be broadcastable with `reduction_dimensions`, and `batch_update`, | ||
which is useful for applying statistics when data comes in streams/batches. | ||
|
||
Where applicable, specified `reduction_dimensions` sets a requirement for the | ||
coordinates passed in the call method. | ||
|
||
## Custom Statistics | ||
|
||
Integrating your own statistics is easy, just satisfy the interface above. We recommend | ||
users look at the custom statistic example in the {ref}`extension_examples` examples. | ||
|
||
# Metrics | ||
|
||
Like statistics, metrics are reductions across existing dimensions. Unlike statistics, | ||
which are usually defined over a single input, we define metrics to take a pair of | ||
inputs. Otherwise, the API and requirements are similar to the statistics requirements. | ||
|
||
## Metrics Interface | ||
|
||
```{literalinclude} ../../../earth2studio/statistics/base.py | ||
:lines: 45- | ||
:language: python | ||
``` | ||
|
||
## Contributing Statistics and Metrics | ||
|
||
Want to add your own statistic or metric to the package? Great, we will be happy to | ||
work with you. At the minimum we expect the model to abide by the interface defined | ||
above. We may also work with the user to ensure that there are `reduction_dimensions` | ||
applicable and, if possible, weight and batching support possible. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters