Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciliation of forecasts in stretched crossvalidation #305

Closed
henningsway opened this issue Feb 3, 2021 · 7 comments
Closed

Reconciliation of forecasts in stretched crossvalidation #305

henningsway opened this issue Feb 3, 2021 · 7 comments

Comments

@henningsway
Copy link

I've recently been working regularly with fable and the package has been a joy to work with, thank you!

I currently would like to use tsibble::stretch_tsibble (is there an alternative? I'm wondering about the "questioning" lifecycle tag) to evaluate reconciled forecasts. It seems to be working for sliding windows, but for stretched tsibbles I run into an error.

Please find a reproducable example below

library(tidyverse)
library(tsibble)
library(fable)
#> Lade nötiges Paket: fabletools


tourism_hts <- tourism %>%
  aggregate_key(State / Region, Trips = sum(Trips))

# reconciliation with sliding window - works
fc_slided <- tourism_hts %>% 
  filter(State == "Tasmania") %>% 
  slide_tsibble(.step = 8, .size = 60) %>%
  model(ets = ETS(Trips)) %>%
  reconcile(ets_rec = min_trace(ets)) %>%
  forecast(h = 4)

# reconciliation with sliding window - doesn't work
fc_slided <- tourism_hts %>% 
  filter(State == "Tasmania") %>% 
  stretch_tsibble(.step = 8, .init = 60) %>%
  model(ets = ETS(Trips)) %>%
  reconcile(ets_rec = min_trace(ets)) %>%
  forecast(h = 4)
#> Error: Problem with `mutate()` input `ets_rec`.
#> x Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl für Funktion 'as.matrix': Join columns must be present in data.
#> x Problem with `date`.
#> i Input `ets_rec` is `(function (object, ...) ...`.

Created on 2021-02-03 by the reprex package (v0.3.0)

@henningsway henningsway changed the title Reconciliation of forecasts in stretched crossvalidation raises error Reconciliation of forecasts in stretched crossvalidation Feb 3, 2021
@mitchelloharawild
Copy link
Member

This specific error was fixed in 683e8a9, however reconciling cross validated forecasts is not yet possible.

This is because the key variable used to identify the cross validation fold becomes part of the hierarchy. As there is no <aggregated> value for these folds (which is appropriate), this produces 'disjoint' hierarchies (where each branch - or fold - should be reconciled separately).

The relevant issue for this is here: #106

@claudiolaas
Copy link

Hi Mitchell, I am working with @henningsway on this and we thought of a workaround: iterate over the chunks that stretch_tsibble or slide_tsibble generate and do the model-reconciliate-accuracy step on each chunk individually. Then average the error metrics over all chunks to get the overall error metric.

However, not all error metrics came out accurately. Some examples: ME, MAPE and CRPS did average out to the correct overall value but MASE and RMSSE did not. We suppose that one would have to average over(?) the residuals of the chunks and then calculate the overall error metrics instead of calculating the error metrics for each chunk and then do the averaging.

Or in other words, how exactly do the forecasts of a stretched tsibble get combined to arrive at one overall accuracy measure?

@mitchelloharawild
Copy link
Member

Could you elaborate on why you think the MASE and RMSSE error metrics are not accurate? Perhaps there is a problem or confusion about the scaling of these accuracy measures.

When forecasting a stretched tsibble, you will get separate forecasts for each fold of the tsibble. From there, you can compute a set of accuracy() measures for the forecast errors using the test set. Typically these accuracy measures would be summarised into a single value (across the folds of the stretched tsibble) using a mean or median.

@claudiolaas
Copy link

Typically these accuracy measures would be summarised into a single value (across the folds of the stretched tsibble) using a mean or median.

This is exactly what we tried, but it appears that by using stretch or slide some values get averaged out differently, see example below.

# just one time series
test_data <- tourism %>%
  filter(Region == "Adelaide",
         State == "South Australia",
         Purpose == "Business")

#create two non overlapping chunks of 39 rows each
fc_slide <- test_data %>% 
  slide_tsibble(.step = 39, .size = 39) %>% 
  model(ets = ETS(Trips)) %>% 
  forecast(h = 1)  %>% 
  accuracy(test_data)#,measures = list(distribution_accuracy_measures))

# first 39 rows
fc_1 <- test_data %>%
  filter(Quarter < yearquarter("2007 Q4")) %>% 
  model(ets = ETS(Trips)) %>%
  forecast(h = 1) %>% 
  accuracy(test_data)#,measures = list(distribution_accuracy_measures))


# second 39 rows
fc_2 <- test_data %>%
  filter(Quarter >= yearquarter("2007 Q4"),
         Quarter <= yearquarter("2017 Q2")) %>% 
  model(ets = ETS(Trips)) %>%
  forecast(h = 1) %>% 
  accuracy(test_data)#,measures = list(distribution_accuracy_measures))


(fc_1$ME + fc_2$ME)/2 == fc_slide$ME # --> True

(fc_1$RMSSE + fc_2$RMSSE)/2 == fc_slide$RMSSE # --> False

@mitchelloharawild
Copy link
Member

@robjhyndman I think I asked you about this before, but I couldn't find the answer. When computing scaled accuracy measures over folds of a cross-validated dataset, is it more appropriate to use the same scaling factor or a scaling factor specific to each fold?

@robjhyndman
Copy link
Member

I would use the same scaling factor computed over the whole data set. Otherwise it just adds another source of variability.

@mitchelloharawild
Copy link
Member

Closing as scaling factor used in accuracy() is more appropriate, and reconciliation of cross-validated models will be added in #106.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants