Skip to content

Commit

Permalink
Backports for v0.10.2. (#2163)
Browse files Browse the repository at this point in the history
* Make `PandasDataset` faster (#2148)

* Interrupting `mx.Trainer` stops training. (#2131)


Co-authored-by: Jasper <[email protected]>

* Docs: Simplify wide `DataFrame` example (#2150)

* Docs: fix links in models table (#2156)

* Ignore divide warnings in evaluation. (#2159)

* Add 'Background' section to docs. (#2129)

* Docs: Add info about version guarantees. (#2161)

Co-authored-by: Lorenzo Stella <[email protected]>
Co-authored-by: Hongqing-work <[email protected]>
  • Loading branch information
3 people authored Jul 14, 2022
1 parent 460e03d commit 64fc923
Show file tree
Hide file tree
Showing 10 changed files with 267 additions and 94 deletions.
Binary file added docs/_static/electricity-10w.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/forecast-distributions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
139 changes: 139 additions & 0 deletions docs/getting_started/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
```{admonition} **Draft**
This article may be extended or reworked in the future.
```

# Background

## What is Time-Series Forecasting?

Generally speaking, forecasting just means making predictions about events in
the feature. Trivially, in time-series forecasting we want to predict the
future values of a given time-series.

For example, in electricy production it is very important that demand and
supply are in balance. Thus, producers anticipate consumer demand for
electricity and plan production capacity accordingly. In other words, producers
rely on accurate time-series forecasting of consumer demand for electricity to
generate just enough supply.

In forecasting, there is the implicit assumption that observable behaviours of
the past that impact time-series values continue into the future. To stay
with the electricity example: people will generally consume less energy in the
night than during the day, will watch TV mostly during the evenings and use
air conditioners when it's hot during summer.

```{figure} ../_static/electricity-10w.png
---
---
Ten weeks of data plotted over each other -- ``electricity`` dataset.
```

Naturally, it's impossible to forecast the unpredictable. For instance, in 2019
it was virtually impossible to account for the possibility of travel
restrictions due to the Covid-pandemic when trying to forecacst travel demand
for 2020.

Thus, forecasting operates on the caveat that the underlying factors that
generate the time-series values don't fundamentally change in the future. It is
a tool to predict the ordinary and not the surprising.

To look at this another way: Models are actually trained to predict the past
and it is only us that uses models to forecast into the future.


## Target And Features

We call the time-series that we want to predict the `target` time-series. The
past target values are the most important information a model can use to make
accurate predictions.

In addition, models can make use of features, additional values that have an
impact on the target value. We differentiate between "static" and "dynamic"
features.

A dynamic feature can be different for every time-point. For example, this
could be the price of a product, but also more general information such as
outside air temperature. Internally, we generate dynamic features for things
like the age of the time-series or what day of the week it is.

```{important}
Most models require dynamic features to be available in the future time-range
when making predictions.
```

In contrast, static features describe a time-series independently of time. If
we were to predict different products across different stores, we can use
static features to label each time-series to include store and product
indentifiers.

We further differentiate between categorical and continous (real) features. The
idea is that in continous features the number itself has meaning: For example
when using the price as a feature. A categorical feature on the other hand
doesn't have the same property: Stores `0`, `1`, and `2` are distinct entities
and there is notion of having "higher store".

<!-- TODO: Have some nice example examplifying the above. -->

<!-- ```{admonition} Example -->

<!-- Image we are the owner of a cafe. -->

<!-- ``` -->


## Probabilistic Forecasting

One core idea in GluonTS is that we don't produce simple values as forecasts,
but actually predict distributions.

An intuitive way to look at this is to imagine predicting a time-series 100
times, which returns 100 different time-series samples, which form a
distribution around them. Except that we can directly emit these distributions
and then draw samples from them.

Distributions provide the benefit that they provide a range of likely values.
Imagine being a restaurant wondering how many ingredients to buy; if we buy
too little we won't serve customer demand, but buying too many will produce
waste. Thus, when we forecast demand, it is valuable if a model can tell us
that there is probably a demand of say 50 dishes, but unlikely more than 60.

```{figure} ../_static/forecast-distributions.png
---
---
Predicting 24 hours, showing `p50`, `p90`, `p95`, `p98` confidence intervals.
```

```{note}
The predicted distributions are not authorative: A predicted 90th percentile
doesn't mean that only 10% of actual values will be of higher value, but that
his is the guess of the model where this line is.
```

## Local and Global Models

In GluonTS we use the concepts of local and global models.

A local model is fit for a single-time series and used to make predictions for
that time-series, whilst a global model is trained across many time-series and
a single global model is used to make predictions for all time-series of a
dataset.

Training a global model can take a lot of time: up to hours, but sometimes even
days. Thus, it is not feasible to train the model as part of the prediction
request and it happens as a seperate "offline" step. In contrast, fitting a
local model is usually much faster and is done "online" as part of the
prediction.

In GluonTS, local models are directly available as predictors, whilst global
models are offered as estimators, which need to be trained first:


<!-- TODO -->
<!-- ## Train Test Split -->
<!-- ## Measuring Accuracy -->
38 changes: 30 additions & 8 deletions docs/getting_started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,49 @@ GluonTS is available from PyPi via:

```sh
pip install gluonts
````
```

```{attention}
**GluonTS uses a minimal dependency model.**
````{caution}
This means that to use most models and features additional dependencies need to
be installed. See the next section for more information.
GluonTS uses [Semantic Versioning](https://semver.org) for managing versions.
Since the library is actively developed we use `v0` as the major version.
We plan to release a new minor version at the end of each quarter. The current
planned releases can be found on [GitHub](https://github.com/awslabs/gluon-ts/milestones).
**Version Guarantees**
Breaking changes are only introduced with a new minor release. Bug fixes and
minor improvements are provided for the current release and are published on
demand.
For production usage, we suggest restricting the version when installing
GluonTS:
```sh
pip install gluonts==0.10.*
```
````


## Optional and Extra Dependencies

```{important}
**GluonTS uses a minimal dependency model.**
This means that to use most models and features additional dependencies need to
be installed.
```

Python has the notion of [extras](https://peps.python.org/pep-0508/#extras)
-- dependencies that can be optionally installed to unlock certain features of
a pacakge.
a package.

When installing a package, they are passed via ``[...]`` after the package
name:
name (and version specifier):

```sh
pip install some-package[extra-1,extra-2]
pip install some-package==version[extra-1,extra-2]
````

We make extensive use of optional dependencies in GluonTS to keep the amount of
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ NPTS | Local | Uni
[Salinas2020]: https://doi.org/10.1016/j.ijforecast.2019.07.001
[Rangapuram2018]: https://papers.nips.cc/paper/2018/hash/5cf68969fb67aa6082363a6d4e6468e2-Abstract.html
[Wang2019]: https://proceedings.mlr.press/v97/wang19k.html
[Turkmen2021]: (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0259764)
[Turkmen2021]: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0259764
[Wen2017]: https://arxiv.org/abs/1711.11053
[Oreshkin2019]: https://openreview.net/forum?id=r1ecqn4YwB
[Hasson2021]: https://openreview.net/forum?id=VD3TMzyxKK
[Li2019]: https://papers.nips.cc/paper/2019/hash/6775a0635c302542da2c32aa19d86be0-Abstract.html
[Lim2021]: https://doi.org/10.1016/j.ijforecast.2021.03.012
[Vaswani2017]: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[vanDenOord2016]: (https://arxiv.org/abs/1609.03499)
[vanDenOord2016]: https://arxiv.org/abs/1609.03499
[Salinas2019]: https://proceedings.neurips.cc/paper/2019/hash/0b105cf1504c4e241fcc6d519ea962fb-Abstract.html
[Lai2018]: https://doi.org/10.1145/3209978.3210006
[Shchur2020]: https://arxiv.org/pdf/1909.12127
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
:hidden:

getting_started/install
getting_started/background
getting_started/concepts
getting_started/models

Expand Down
60 changes: 26 additions & 34 deletions docs/tutorials/data_manipulation/pandasdataframes.md.template
Original file line number Diff line number Diff line change
Expand Up @@ -92,40 +92,7 @@ predictions = predictor.predict(ds)
```


## Use case 2 - Loading data from a `wide` dataframe

Here, instead of a `long` dataset, we are given data in the `wide` format,
where time series are stacked side-by-side. To convert the dataset
to `PandasDataset` we first convert it to the `long` format and then use the
`PandasDataset.from_long_dataframe` constructor.

```python
import pandas as pd

url_wide = (
"https://gist.githubusercontent.com/rsnirwan/c8c8654a98350fadd229b00167174ec4"
"/raw/a42101c7786d4bc7695228a0f2c8cea41340e18f/ts_wide.csv"
)
df_wide = pd.read_csv(url_wide, index_col=0, parse_dates=True)
print(df_wide.head())
```

```python
df_long = pd.melt(df_wide.reset_index(), id_vars="index", value_vars=df_wide.columns)
print(df_long.head())
```

```python
from gluonts.dataset.pandas import PandasDataset

ds = PandasDataset.from_long_dataframe(
df_long, item_id="variable", target="value", timestamp="index"
)
```
As shown in `Use case 1` we can now use `ds` to train an estimator.


## Use case 3 - Loading data with missing values
## Use case 2 - Loading data with missing values

In case the `timestamp` column is not evenly spaced and monotonically increasing
we get an error when using `PandasDataset`. Here we show how to fill in the gaps
Expand Down Expand Up @@ -165,6 +132,31 @@ ds = PandasDataset(dfs_dict, target="target")
```


## Use case 3 - Loading data from a `wide` dataframe

Here, we are given data in the `wide` format, where time series are stacked side-by-side in a `DataFrame`.
We can simply turn this into a dictionary of `Series` objects with `dict`, and construct a `PandasDataset` with it:

```python
import pandas as pd

url_wide = (
"https://gist.githubusercontent.com/rsnirwan/c8c8654a98350fadd229b00167174ec4"
"/raw/a42101c7786d4bc7695228a0f2c8cea41340e18f/ts_wide.csv"
)
df_wide = pd.read_csv(url_wide, index_col=0, parse_dates=True)
print(df_wide.head())
```

```python
from gluonts.dataset.pandas import PandasDataset

ds = PandasDataset(dict(df_wide))
```

As shown in `Use case 1` we can now use `ds` to train an estimator.


## General use cases

Here, we explain in detail what data formats `PandasDataset` can work with,
Expand Down
3 changes: 2 additions & 1 deletion src/gluonts/dataset/pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,4 +305,5 @@ def is_uniform(index: pd.PeriodIndex) -> bool:
>>> is_uniform(pd.DatetimeIndex(ts).to_period("2H"))
False
"""
return (index[1:] - index[:-1] == index.freq).all()
other = pd.period_range(index[0], periods=len(index), freq=index.freq)
return (other == index).all()
2 changes: 1 addition & 1 deletion src/gluonts/evaluation/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ def __call__(
zip(ts_iterator, fcst_iterator),
total=num_series,
desc="Running evaluation",
) as it, np.errstate(invalid="ignore"):
) as it, np.errstate(divide="ignore", invalid="ignore"):
if self.num_workers and not sys.platform == "win32":
mp_pool = multiprocessing.Pool(
initializer=None, processes=self.num_workers
Expand Down
Loading

0 comments on commit 64fc923

Please sign in to comment.