-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconciliation of forecasts in stretched crossvalidation #305
Comments
This specific error was fixed in 683e8a9, however reconciling cross validated forecasts is not yet possible. This is because the key variable used to identify the cross validation fold becomes part of the hierarchy. As there is no The relevant issue for this is here: #106 |
Hi Mitchell, I am working with @henningsway on this and we thought of a workaround: iterate over the chunks that stretch_tsibble or slide_tsibble generate and do the model-reconciliate-accuracy step on each chunk individually. Then average the error metrics over all chunks to get the overall error metric. However, not all error metrics came out accurately. Some examples: ME, MAPE and CRPS did average out to the correct overall value but MASE and RMSSE did not. We suppose that one would have to average over(?) the residuals of the chunks and then calculate the overall error metrics instead of calculating the error metrics for each chunk and then do the averaging. Or in other words, how exactly do the forecasts of a stretched tsibble get combined to arrive at one overall accuracy measure? |
Could you elaborate on why you think the MASE and RMSSE error metrics are not accurate? Perhaps there is a problem or confusion about the scaling of these accuracy measures. When forecasting a stretched tsibble, you will get separate forecasts for each fold of the tsibble. From there, you can compute a set of |
This is exactly what we tried, but it appears that by using stretch or slide some values get averaged out differently, see example below.
|
@robjhyndman I think I asked you about this before, but I couldn't find the answer. When computing scaled accuracy measures over folds of a cross-validated dataset, is it more appropriate to use the same scaling factor or a scaling factor specific to each fold? |
I would use the same scaling factor computed over the whole data set. Otherwise it just adds another source of variability. |
Closing as scaling factor used in |
I've recently been working regularly with fable and the package has been a joy to work with, thank you!
I currently would like to use
tsibble::stretch_tsibble
(is there an alternative? I'm wondering about the "questioning" lifecycle tag) to evaluate reconciled forecasts. It seems to be working for sliding windows, but for stretched tsibbles I run into an error.Please find a reproducable example below
Created on 2021-02-03 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: