Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kpss default lag for ndiffs #42

Open
mitchelloharawild opened this issue May 7, 2019 · 3 comments
Open

Improve kpss default lag for ndiffs #42

mitchelloharawild opened this issue May 7, 2019 · 3 comments

Comments

@mitchelloharawild
Copy link
Member

Which lag is most appropriate? Old or new?
Which gives best ARIMA performance on M3/M4?

@mitchelloharawild mitchelloharawild added this to the v0.2.0 milestone May 23, 2019
@davidreilly007
Copy link

davidreilly007 commented Oct 22, 2019

Hi Mitchell,

Further to our twitter discussion, I thought it would be important before doing M3 or M4 runs, to answer the question, must Ln Airline Passengers be (0,1,1) (0,1,1). Because if so, you are constrained to ts short.

Running tsCV shows a slight edge for the (2,0,0) model. However here's a relevant comment from a past discussion with Rob:

DR: Here is an interesting change as a result of the kpss lag changes on a classical data set. If you model the log AirPassengers (Series G) data using auto.arima, the model (non-seasonal part) changes from the well known 0,1,1 to 2,0,1 with drift. If you then do a 24 month holdout and compare the two, the latter model is a better forecast!

RH: Interesting. I'm not sure it's good though. Only a seasonal difference will mean the trend is modelled as a drift term, which is not very adaptable. Two differences allows for local linear trends.

https://robjhyndman.com/hyndsight/show-me-the-evidence/#comment-3790363419

library(forecast)
ap=ts(AirPassengers,start=c(1949,1),frequency=12)
log.AP = log(ap)

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=1)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=1)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1

library(forecast)
ap=ts(AirPassengers,start=c(1949,1),frequency=12)
log.AP = log(ap)

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=1)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0
[1] 0.03984144

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=1)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1
[1] 0.03993203

Edit: There is a slight difference in the models mentioned in the discussion (2,0,1) vs (2,0,0), I think due to the change in stepwise.

Edit 2: h=24, d=0 is better.

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=24)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=24)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=24)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0
[1] 0.08031493
fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=24)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1
[1] 0.0831755

@davidreilly007
Copy link

On the other hand there is elegance in the simplicity of (0,1,1) (0,1,1), even if it is not the “best” model.

@davidreilly007
Copy link

davidreilly007 commented Oct 23, 2019

So here’s an interesting result. I ran M3 with default forecast 8.9 settings for auto.arima and got MASE 1.454.

Then I ran with d=1 and got 1.402!!

That’s the best auto.arima M3 number I’ve seen.

Not for a moment suggesting that’s a solution but clearly the problem of over differencing is not as bad as underdifferencing!

@mitchelloharawild mitchelloharawild removed this from the v0.2.0 milestone Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants