-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
response(<mdl_ts>) is unable to backtransform from parameters existing in the data #301
Comments
Thanks @mitchelloharawild! 1 - You can't have this type of transformation along with a simple/no transformation in the same mdl_df fit <- df_train %>%
model(guerrero_arima = ARIMA(box_cox(Turnover, lambda = lambda_guerrero)),
arima = ARIMA(Turnover)
)
#> Error: A mable can only contain models with the same response variable(s). 2 - You can't evaluate accuracy on the test set and it doesn't give an error on the train-set accuracy (although the back-transform has not been applied) library(dplyr)
library(tsibble)
library(fabletools)
library(feasts)
library(fable)
library(tsibbledata)
df <- tsibbledata::aus_retail %>%
filter(State == "Australian Capital Territory")
TEST_H <- 6
df_train <- df %>% filter(Month <= max(df$Month)-6)
# extract the optimal lambda for box-cox using the guerrero method
lambda_g <- df_train %>% features(Turnover, features = guerrero)
df_train <- df_train %>% left_join(lambda_g, by = c("State", "Industry"))
# fit data with box cox and lambda from guererro
fit <- df_train %>%
model(guerrero_arima = ARIMA(box_cox(Turnover, lambda = lambda_guerrero))
)
fcast <- fit %>% forecast(h=TEST_H)
eval_train <- accuracy(fit, measures = list(mape=MAPE, rmse=RMSE, mase=MASE, rmsse=RMSSE))
eval_test <- fcast %>%
accuracy(
data = df, measures = list(mape=MAPE, rmse=RMSE, mase=MASE, rmsse=RMSSE)
)
#> Error: Problem with `mutate()` input `.actual`.
#> x object 'box_cox(Turnover, lambda = lambda_guerrero)' not found
#> ℹ Input `.actual` is `box_cox(Turnover, lambda = lambda_guerrero)`. I thought this was a promising work-around but it failed miserably. fcast <- fit %>%
forecast(h=TEST_H) %>%
left_join(lambda_g, by = c("State", "Industry")) %>%
rename(transformed_Turnover = "box_cox(Turnover, lambda = lambda_guerrero)") %>%
mutate(Turnover = inv_box_cox(transformed_Turnover, lambda = lambda_guerrero)) %>%
mutate(.mean = inv_box_cox(.mean, lambda = lambda_guerrero))
#> Error: Problem with `mutate()` input `Turnover`.
#> x `vec_proxy_compare.distribution()` not supported.
#> ℹ Input `Turnover` is `inv_box_cox(transformed_Turnover, lambda = lambda_guerrero)`. |
Thanks for the additional examples. As far as I am aware, these are all rooted in the |
To jump on the bandwagon regarding this powerful feature. I tried to create a very simple transformation that takes the index column as a second input in order to normalize the value column by days in the period with the same errors. It is unfortunate because such as transformation would have worked if x was of class 'ts'. Maybe there is a work around in the meantime but I haven't figured it out. period_days <- function(x, i){
f = guess_frequency(i)
if (f == 12) {
x <- x / lubridate::days_in_month(i)
}
else if (f == 4)
{
x <- x / days_in_quarter(i)
}
else if (f == 1)
{
x <- x / days_in_year(i)
}
else {
x
}
}
inv_period_days <- function(x, i){
f = guess_frequency(i)
if (f == 12) {
x <- x * lubridate::days_in_month(i)
}
else if (f == 4)
{
x <- x * days_in_quarter(i)
}
else if (f == 1)
{
x <- x * days_in_year(i)
}
else {
x
}
}
trans_period_days <- new_transformation(transformation = period_days, inverse = inv_period_days) |
Edit: this approach is actually not correct # Test case
test <- UKLungDeaths
train <- test %>% filter_index(~ '1978 Dec')
new <- new_data(train, 12)
train %>%
model(ets = ETS(value), trans = ETS(trans_period_days(value, first(index)))) %>%
forecast(new) %>%
autoplot(test) # Imitation of lubridate::days_in_month functionality
N_DAYS_IN_QUARTERS <- c(90, 91, 92, 92)
days_in_year <- function(x)
{
n_days <- rep(365, length(x))
n_days[lubridate::leap_year(x)] <- 366
n_days
}
days_in_quarter <- function(x)
{
quarter_x <- lubridate::quarter(x)
n_days <- N_DAYS_IN_QUARTERS[quarter_x]
n_days[quarter_x == 1 & lubridate::leap_year(x)] <- 91
n_days
}
# custom UDFs
period_days <- function(x, i){
i <- i + 1:length(x) - 1
if (is_yearmonth(i))
x <- x / lubridate::days_in_month(i)
if (is_yearquarter(i))
x <- x / days_in_quarter(i)
if (is.numeric(i))
x <- x / days_in_year(i)
as.vector(x)
}
inv_period_days <- function(x, i){
i <- i + 1:length(x) - 1
if (is_yearmonth(i))
x <- x * lubridate::days_in_month(i)
if (is_yearquarter(i))
x <- x * days_in_quarter(i)
if (is.numeric(i))
x <- x * days_in_year(i)
as.vector(x)
}
trans_period_days <- new_transformation(transformation = period_days, inverse = inv_period_days) |
Hi @rando-brando, I think you're having issues with fable determining which variable is the one you want to predict. In your case, |
Thank you for the fast response! I followed your suggestion and updated the function by removing the index incrementer and adding test <- UKLungDeaths <- as_tsibble(cbind(mdeaths, fdeaths), pivot_longer = TRUE)
train <- test %>% filter_index(~ '1978 Dec')
new <- new_data(train, 12)
train %>%
model(trans = ETS(trans_period_days_v2(resp(value), index))) %>%
forecast(new) %>%
autoplot(test)
#Error in `mutate()`:
#! Problem while computing `trans = (function (object, ...) ...`.
#Caused by error in `hessian.default()`:
#! Richardson method for hessian assumes a scalar valued function. For reference the updated UDF transformation is provided below along with a sample output comparison from mutate. #transformation v2
period_days_v2 <- function(x, i){
if (is_yearmonth(i))
x <- x / lubridate::days_in_month(i)
if (is_yearquarter(i))
x <- x / days_in_quarter(i)
if (is.numeric(i))
x <- x / days_in_year(i)
as.vector(x)
}
inv_period_days_v2 <- function(x, i){
if (is_yearmonth(i))
x <- x * lubridate::days_in_month(i)
if (is_yearquarter(i))
x <- x * days_in_quarter(i)
if (is.numeric(i))
x <- x * days_in_year(i)
as.vector(x)
}
trans_period_days_v2 <- new_transformation(period_days_v2, inv_period_days_v2)
# v1 vs v2 - intended result of value / days_in_month(index) appears correct
train %>%
group_by_key %>%
mutate(old = trans_period_days(value, first(index)),
new = trans_period_days_v2(value , index))
# A tsibble: 120 x 5 [1M]
# Key: key [2]
# Groups: key [2]
index key value old new
<mth> <chr> <dbl> <dbl> <dbl>
1 1974 Jan mdeaths 2134 68.8 68.8
2 1974 Feb mdeaths 1863 66.5 66.5
3 1974 Mar mdeaths 1877 60.5 60.5
4 1974 Apr mdeaths 1877 62.6 62.6
5 1974 May mdeaths 1492 48.1 48.1
6 1974 Jun mdeaths 1249 41.6 41.6
7 1974 Jul mdeaths 1280 41.3 41.3
8 1974 Aug mdeaths 1131 36.5 36.5
9 1974 Sep mdeaths 1209 40.3 40.3
10 1974 Oct mdeaths 1492 48.1 48.1 |
Interesting that you would encounter an error like that, it indicates that there is some issue computing the mean of the back-transformed forecast distribution. This is more of an implementation issue than anything, and is fixable. Could you try plotting the median instead and see if you get an error? This can be done with: Alternatively you could try writing the response as |
I should have been more clear. The error is occurring in train %>%
model(trans = ETS(days_in_period_v2(resp(value), index))) %>%
forecast(new, point_forecast = list(median))
#Error in `mutate()`:
#! Problem while computing `trans = (function (object, ...) ...`.
#Caused by error:
#! Assigned data `point_fc` must be compatible with existing data.
#✖ Existing data has 12 rows.
#✖ Assigned data has 144 rows.
#ℹ Only vectors of size 1 are recycled.
#Run `rlang::last_error()` to see where the error occurred. |
I'll try out your code and see if any changes are needed in the software to make this work. |
After doing a quick trace its clear that the approach you suggested with # first transformed forecast of 25.76600 is multiplied by the length of days_in_month(i) or (31, 28, 31, 30 ...)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
798.746 721.448 798.746 772.980 798.746 772.980 798.746 798.746 772.980 798.746 772.980 798.746
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
798.746 721.448 798.746 772.980 798.746 772.980 798.746 798.746 772.980 798.746 772.980 798.746
# <crashes here as it expects a return of length 1 not 12>
# simplified transforms
period_days_v3 <- function(x, i){
x / lubridate::days_in_month(i)
}
inv_period_days_v3 <- function(x, i){
x * lubridate::days_in_month(i)
}
trans_period_days_v3 <- new_transformation(period_days_v3, inv_period_days_v3) # expected result of transformed forecast without back-transform
train %>%
group_by_key() %>%
mutate(value = trans_period_days_v3(value, index)) %>%
model(trans = ETS(value)) %>%
forecast(new)
# A fable: 24 x 5 [1M]
# Key: key, .model [2]
key .model index value .mean
<chr> <chr> <mth> <dist> <dbl>
1 fdeaths trans 1979 Jan N(26, 8.6) 25.8
2 fdeaths trans 1979 Feb N(29, 11) 29.1
3 fdeaths trans 1979 Mar N(24, 7.8) 24.4
4 fdeaths trans 1979 Apr N(20, 5.2) 19.9
5 fdeaths trans 1979 May N(15, 3.2) 15.4
6 fdeaths trans 1979 Jun N(14, 2.5) 13.6
7 fdeaths trans 1979 Jul N(13, 2.3) 13.0
8 fdeaths trans 1979 Aug N(12, 1.8) 11.5
9 fdeaths trans 1979 Sep N(12, 2) 12.0
10 fdeaths trans 1979 Oct N(14, 2.8) 14.5 |
Should the model object also store these parameters?
From #103
Created on 2021-01-08 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: