Improve kpss default lag for ndiffs #42

mitchelloharawild · 2019-05-07T06:52:24Z

Which lag is most appropriate? Old or new?
Which gives best ARIMA performance on M3/M4?

davidreilly007 · 2019-10-22T10:57:13Z

Hi Mitchell,

Further to our twitter discussion, I thought it would be important before doing M3 or M4 runs, to answer the question, must Ln Airline Passengers be (0,1,1) (0,1,1). Because if so, you are constrained to ts short.

Running tsCV shows a slight edge for the (2,0,0) model. However here's a relevant comment from a past discussion with Rob:

DR: Here is an interesting change as a result of the kpss lag changes on a classical data set. If you model the log AirPassengers (Series G) data using auto.arima, the model (non-seasonal part) changes from the well known 0,1,1 to 2,0,1 with drift. If you then do a 24 month holdout and compare the two, the latter model is a better forecast!

RH: Interesting. I'm not sure it's good though. Only a seasonal difference will mean the trend is modelled as a drift term, which is not very adaptable. Two differences allows for local linear trends.

https://robjhyndman.com/hyndsight/show-me-the-evidence/#comment-3790363419

library(forecast)
ap=ts(AirPassengers,start=c(1949,1),frequency=12)
log.AP = log(ap)

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=1)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=1)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1

library(forecast)
ap=ts(AirPassengers,start=c(1949,1),frequency=12)
log.AP = log(ap)

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=1)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0
[1] 0.03984144

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=1)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1
[1] 0.03993203

Edit: There is a slight difference in the models mentioned in the discussion (2,0,1) vs (2,0,0), I think due to the change in stepwise.

Edit 2: h=24, d=0 is better.

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=24)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0

fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=24)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1

fit0 <- function(x, h){ forecast(Arima(x,order=c(2,0,0),seasonal=list(order=c(0,1,1),period=12),include.constant=TRUE), h=h)}
e0 <- tsCV(log.AP,fit0, h=24)
rmse0 <- sqrt(mean(e0^2, na.rm=TRUE))
rmse0
[1] 0.08031493
fit1 <- function(x, h){ forecast(Arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),include.constant=FALSE), h=h)}
e1 <- tsCV(log.AP,fit1, h=24)
rmse1 <- sqrt(mean(e1^2, na.rm=TRUE))
rmse1
[1] 0.0831755

davidreilly007 · 2019-10-22T14:04:54Z

On the other hand there is elegance in the simplicity of (0,1,1) (0,1,1), even if it is not the “best” model.

davidreilly007 · 2019-10-23T09:48:03Z

So here’s an interesting result. I ran M3 with default forecast 8.9 settings for auto.arima and got MASE 1.454.

Then I ran with d=1 and got 1.402!!

That’s the best auto.arima M3 number I’ve seen.

Not for a moment suggesting that’s a solution but clearly the problem of over differencing is not as bad as underdifferencing!

mitchelloharawild added this to the v0.2.0 milestone May 23, 2019

mitchelloharawild removed this from the v0.2.0 milestone Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve kpss default lag for ndiffs #42

Improve kpss default lag for ndiffs #42

mitchelloharawild commented May 7, 2019

davidreilly007 commented Oct 22, 2019 •

edited

Loading

davidreilly007 commented Oct 22, 2019

davidreilly007 commented Oct 23, 2019 •

edited

Loading

Improve kpss default lag for ndiffs #42

Improve kpss default lag for ndiffs #42

Comments

mitchelloharawild commented May 7, 2019

davidreilly007 commented Oct 22, 2019 • edited Loading

davidreilly007 commented Oct 22, 2019

davidreilly007 commented Oct 23, 2019 • edited Loading

davidreilly007 commented Oct 22, 2019 •

edited

Loading

davidreilly007 commented Oct 23, 2019 •

edited

Loading