-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04_D_ClassicalDecomposition_fromScratch.qmd
716 lines (506 loc) · 24 KB
/
04_D_ClassicalDecomposition_fromScratch.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
---
title: "04_D_Classical_Decomposition_fromScratch"
author: "Juan Garbayo - TS Analysis"
format: html
editor: source
params:
print_sol: true
print_sol_int: true
toc: true
toc-location: left
toc-depth: 6
self-contained: true
---
```{r, include=FALSE, eval=FALSE}
TODOs:
1. Justificar mejor por qué se ajusta el seasonal component a los valores 0 y m.
Hacer esta justificación en un apartado al final de notebook, para los alumnos
que estén interesados en entenderlo, ya que no es necesario, lo que me intere
* Additive: se ajusta a 0 porque, al quitar la trend de manera aditiva te va a
quedar algo que se mueve por encima y por debajo de 0. El valor medio del detrended
no tiene por qué ser 0,
2. Dejar claro cuáles son los objetivos:
- Que practiquen implementar un algoritmo
- Que practiquen la comparación de resultados y que practiquen incluir comprobaciones
intermedias
- Que entiendad el significado de todos los términos de la descomposición, así
como el orden en el que se obtienen:
1. Trend
2. Detrended
3. Seasonal
4. Remainder
```
# References
NOTE: this material is not original, but rather an extension of the following reference, which is publicly available.
1. Hyndman, R.J., & Athanasopoulos, G., 2021. *Forecasting: principles and practice*. 3rd edition. [Link](https://otexts.com/fpp3/)
# Packages
```{r, error=FALSE, warning=FALSE, message = FALSE}
library(fpp3)
```
# Classical decomposition
It is the starting point for most other methods of time series decomposition. In addition, it is simple enough that it can be easily implemented from scratch, which is something we are going to do in class.
Essentially are 2 variants of classical decomposition:
* Classical decomposition of **aditive schemes**
* Classical decomposition of **multiplicative schemes**
The algorithm depends on **whether the length seasonal period is even or uneven.** Examples:
* If we have daily data we may consider, for example, a period of one week. That seasonal period would be of length 7 (7 days).
* NOTE: For daily data we could also consider other seasonal periods: one month, one quarter... even one year.
* If we have monthly data, we could consider a yearly seasonal period. That seasonal period would be of length 12 (12 months).
* If we have quarterly data, we could consider a yearly seasonal period. That seasonal period would be of length 4 (4 quarters).
* ...
Because the algorithm **approximates the trend with a centered moving average**, the even or uneven nature of the seasonal period is important. See the separate notebook on moving averages, which has an associated video to understand this:
* **Even seasonal period ($m$ even):** trend approximated by a 2 x $m$ moving average.
* **Uneven seasonal period $m$:** trend approximated by an $m$ moving average.
# Additive schemes - classical decomposition
## Step 1 - Estimate trend
$m$: length of the seasonal period.
* If $m$ is an even number, compute the trend-cycle component $T_t$ using 2 x $m$-MA.
* If $m$ is an odd number compute the trend-cycle component $T_t$ using an $m$-MA.
**Output**: the **trend component $T_t$**
## Step 2 - Detrended series
Calculate the de-trended series: $D_t = y_t - T_t$
**Output**: the **detrended series $D_t$**
## Step 3 - Compute seasonal component
**Main assumption to compute the seasonal component**: the seasonal component remains constant over time. This is one of the main weaknesses of the classical decomposition.
### **Step 3.1.**
Average the de-trended values for each component of the season.
* Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data
* **Output**: a vector of length $m$, with $m$ being the length of the seasonal compoent.
* Let us call this vector **$S_{unadj}$** for *unadjusted seasonal component*
<br>
### **Step 3.2.**
Adjust $S_{unadj}$, the output of step 3.1 to ensure that the sum of its **values adds to 0**.
If we call this adjusted seasonal component $S$, in classical decomposition it will be a vector of length $m$, just like **$S_{unadj}$**. We want the components of that vector to add up to 0:
$$
\text{Condition we want to attain: }\sum_{i=1}^{m} S_i= 0
$$
We want this because, if this is the case, the seasonal component does not contribute to the average of the time series over one period:
$$
\frac{\sum_{i=1}^my_i}{m} = \frac{\sum_{i=1}^m(T_i + S_i + R_i)}{m} \underset{\underset{\sum_{i=1}^mS_i = 0}{\uparrow}}{=} \frac{\sum_{i=1}^m(T_i + R_i)}{m}
$$
The vector resulting from step 3.1, $S_{unadj \ i}$ does not satisfy this condition. We are going to correct each of the components of $S_{unadj \ i}$ by the same amount $x$ in order to fulfill the condition above:
$$
S_i = S_{Unadj \ i} + x
$$
To determine x:
#### Step 3.2.1
Sum all the components of the vector $S_{unadj}$. Lets call this output $a$. This result is the total deviation from 0 we want to correct.
$$
\sum_{i=1}^{m}{S_{Unadj \ i}} = a
$$
#### Step 3.2.2
Obtain $x$ from the equation of the condition we want to attain:
$$
\sum_{i=1}^{m} S_i = 0 \rightarrow \sum_{i=1}^{m} (S_{Unadj \ i} + x) = a + \sum_{i=1}^{m}x = a + m \cdot x \underset{\begin{array}{c} \uparrow \\ \text{condition} \\ \text{imposed} \end{array}}{=} 0 \rightarrow \text{ Solving for x } \rightarrow x = -\frac{a}{m}
$$
By adding this amount to each $S_{unadj}$, we will obtain the seasonal component $S$
* **Output**: the adjusted seasonal component $S$.
## Step 4 - Compute the remainder component
$R_t = D_t - S_t = y_t - T_t - S_t$
## Example: computation using the function `classical_decomposition()`:
```{r}
# Filter the series and select relevant columns
us_retail_employment <-
us_employment %>%
filter(year(Month) >= 1990, Title == "Retail Trade") %>%
select(-Series_ID)
# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <- us_retail_employment %>%
# Fit the classical decomposition model specifying an additive scheme
model(
clas_deic = classical_decomposition(Employed, type = "additive")
) %>%
# Extract components from the fitted model
components()
select(classical_dec, Month, Employed, trend, seasonal, random)
```
```{r}
classical_dec %>%
autoplot()
```
## Example: manual computation implementing the described algorithm.
### Step 1: compute the trend using moving averages
```{r}
# Compute moving averages to estimate the trend
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- us_retail_employment %>%
mutate(
`12-MA` = slider::slide_dbl(Employed, mean,
.before = 5, .after = 6, .complete = TRUE),
`2x12-MA` = slider::slide_dbl(`12-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
# Plot the computed trend
manual_decomposition %>%
autoplot(Employed, colour = "gray") +
geom_line(aes(y = `2x12-MA`), colour = "#D55E00") +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")
```
### Step 2 - detrend the time series
```{r}
# Compute the detrended component:
manual_decomposition <-
manual_decomposition %>%
mutate(
detrended = Employed - `2x12-MA`,
)
manual_decomposition %>% autoplot(detrended)
```
```{r}
```
This is the resulting detrended component, which contains the seasonal + remainder component: $D_t = y_t - T_t = S_t + R_t$
### Step 3: compute the seasonal component
#### Step 3.1. Compute $S_{unadj}$ averaging the de-trended values for each element of the season:
We will store the result in a separate dataframe called `df_seasonal`. This dataframe will contain 2 columns:
1. An identifier for the number of the month. This identifier will be based solely on the month, disregarding the year. More generally, we need to compute an identifier for each element of the seasonal component.
* We will use this identifier to group the data and compute the mean value
2. `s_unadj`: we compute it averaging the detrended series grouping based on the previously created identifier. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data.
We will start by creating the mentioned identifier for each month (disregarding the year) on our dataframe:
```{r}
manual_decomposition <-
manual_decomposition %>%
# Compute an identifier for each month, regardless
# of the year. This will define the subsequent groups
mutate(
n_month = month(Month)
)
manual_decomposition
```
Now we will use this newly created identifier to compute the averages of the detrended time series for each month regardless of the year. Expressed in more general terms, we are computing the averages of the detrended time series for each element of the seasonal component.
```{r}
df_seasonal <-
manual_decomposition %>%
# group by the created identifier
index_by(n_month) %>%
# compute the average seasonal component for every month
summarise(
s_unadj = mean(detrended, na.rm = TRUE)
)
df_seasonal
#Check the aspect of the obtained unadjusted seasonal component
df_seasonal %>%
autoplot(s_unadj) +
scale_x_continuous(breaks = seq(1, 12),
minor_breaks = seq(1, 12)) +
geom_point(color = "red")
```
```{r}
```
This is the resulting unadjusted seasonal component $S_{unadj}$:
* The first point is the unadjusted value of the seasonal component for January.
* The second point is the unadjusted value of the seasonal component for February.
* The third point is the unadjusted value of the seasonal component for March.
* ...
#### Step 3.2 Adjust the seasonal component
Let us check if the values of the component add up to 0 as we wanted to impose:
```{r}
# Seasonal period
m = 12
#The sum does not add up to 0
a = sum(df_seasonal$s_unadj)
a
```
We can see that the sum is not 0. Let us compute the correction $x$ using the equation above:
```{r}
# Correction x (see formula above)
x = - a / m
```
Now we apply that correction:
```{r}
df_seasonal <-
df_seasonal %>%
# Correction so that the sum of the seasoan components is 0
mutate(
seasonal_c = s_unadj + x
) %>%
# Drop s_unadj since we do not need it anymore.
select(n_month, seasonal_c)
# Check that the sum is indeed 0
sum(df_seasonal$seasonal_c)
```
As you can see, the seasonal component is now 0. Great. Let us bring the seasonal component back into our original dataset. For this we will use a `left join` operation. Remember that we created previously a column called `n_month` that now will come in handy when doing this join:
```{r}
# Bring seasonal component into the dataset (I chose to use a "join" statement, very convenient)
manual_decomposition <-
left_join(manual_decomposition, df_seasonal, by = "n_month")
manual_decomposition
```
### Step 4: compute the remainder component removing the seasonal component from the detrended series
Now that we have the seasonal component, we obtain the remainder component by subtracting it from the de-trended series.
```{r}
# Compute remainder component:
manual_decomposition <-
manual_decomposition %>%
mutate(
remainder = detrended - seasonal_c
)
select(manual_decomposition, Month, `2x12-MA`, seasonal_c, remainder)
```
### Test result:
We have all the components. Using the function `all.equal()` we can check that our computation matches exactly the output of the function `classical_decomposition`
```{r}
all.equal(classical_dec$trend, manual_decomposition$`2x12-MA`)
all.equal(classical_dec$seasonal, manual_decomposition$seasonal_c)
all.equal(classical_dec$random, manual_decomposition$remainder)
```
### Extra: generate a plot of the components we just created
```{r}
# As an additional ggplot visualization exercise, below the code to visualize
# the decomposition we have computed manually
# Define levels for the factor variable to ensure proper ordering
# in the facet grid figure.
factor_levels = c("Employed", "trend", "seasonal_c", "remainder")
manual_decomposition_long <-
manual_decomposition %>%
# Rename so that the labels are consistent
rename(trend = `2x12-MA`) %>%
# Select the columns of interest
select(Month, Employed, trend, seasonal_c, remainder) %>%
# Generate long format based on a single variable so that we can
# facet_grid following this variable
pivot_longer(cols = Employed:remainder, names_to = "component") %>%
# Convert to factor to ensure proper ordering in the figure
mutate(
component = factor(component, levels=factor_levels)
)
# Now generate the graph
manual_decomposition_long %>%
ggplot(aes(x = Month, y = value)) +
geom_line() +
# scales = "free_y" so that each component has an adapted y-axis range
facet_grid(component ~ ., scales = "free_y")
```
We can see that it looks just like the decomposition returned by the fable library, which is logical, since all the components are the same
# Multiplicative scheme - classical decomposition
In this case, the algorithm is just the same. In fact the estimation of the trend is identical. However, now subtractions are replaced by divisions.
## Step 1 - Estimate trend
$m$: length of the seasonal period.
* If $m$ is an even number, compute the trend-cycle component $T_t$ using 2 x $m$-MA.
* If $m$ is an odd number compute the trend-cycle component $T_t$ using an $m$-MA.
## Step 2 - Detrended series
Calculate the de-trended series: $y_t / \hat{T_t}$
## Step 3 - Compute seasonal component
**Main assumption to compute the seasonal component**: the seasonal component remains constant over time. This is one of the main weaknesses of the classical decomposition.
### **Step 3.1.**
Average the de-trended values for each component of the season.
* Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data
* **Output**: a vector of length $m$, with $m$ being the length of the seasonal compoent.
* Let us call this vector **$S_{unadj}$** for *unadjusted seasonal component*
<br>
### **Step 3.2.**
Adjust each $S_{unadj\text{ i}}$ (output of step 3.1) by a specific amount to obtain the adjusted seasonal component $S_i$, forcing it to add up to $m$:
$$
\text{Condition to attain: } \sum_{i=1}^{m}S_i = \sum_{i=1}^{m}(S_{unadj\text{ i}} + x) = m
$$
This condition is interesting because, thanks to it, the sum of the time series over a seasonal period becomes a weighted average of $T_iR_i$ in which $S_i/m$ is the percent weight of each $T_iR_i$:
$$
\frac{\sum_{i=1}^m(T_i S_i R_i)}{m} = \underbrace{\sum_{i=1}^m\frac{S_i}{m}(T_i R_i) }_{\text{weighted avg.} \\ \text{of } R_iT_i \text{ since } \sum_{i=1}^mS_i/m = 1}
$$
To attain this condition we will follow the steps below:
#### Step 3.2.1
Sum all the components of the vector $Sunadj_i$. Lets call this output $a$. In general, $a$ will be different from $m$, although it can be very close.
$$
\sum_{i=1}^mS_{unadj\text{ i}} = a
$$
#### Step 3.2.2
Compute the necessary correction $x$ to be added to each $S_{unadj\text{ i}}$ so that the resulting $S$ adds to $m$.
If we assume that we are going to correct each $S_{unadj\text{ i}}$ by the same amount $x$, the resulting correction would be:
$$
\begin{align*}
\text{Condition we want to attain}: && \sum_{i=1}^{m}S_i = \sum_{i=1}^{m}(S_{unadj\text{ i}} + x) \underset{\begin{array}{c} \uparrow \\ \text{condition} \\ \text{imposed} \end{array}}{=} m \\
\text{Developing the sums: } && \sum_{i=1}^m{S_i} = \sum_{i=1}^mS_{unadj\text{ i}} + \sum_{i=1}^mx = a + m \cdot x \underset{\begin{array}{c} \uparrow \\ \text{condition} \\ \text{imposed} \end{array}}{=} m \\
\text{Hence, solving for x: } && x = \frac{m - a}{m} = 1-\frac{a}{m}
\end{align*}
$$
If we made this correction, we would already fulfill the condition we wanted to attain.
$$
\sum_{i=1}^{m}S_i = \sum_{i=1}^{m}(S_{unadj\text{ i}} + x) = a + m \cdot x = a + m \cdot (1-\frac{a}{m}) = a + m - a = m
$$
However, **we are going to make a different correction for each $S_{unadj\text{ i}}$ instead of the same correction for each of them**. Specifically, the correction $x_i$ for each $S_{unadj\text{ i}}$ results from scaling the previous constant correction $x$ by a factor proportional to $S_{unadj\text{ i}}$ as shown below:
$$
x_i = m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot x = m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot (1-\frac{a}{m})
$$
We may check that this correction also fulfills the desired condition:
```{r, eval=FALSE, include=FALSE}
# TODO: COMPROBAR SI ESTA CORRECTION ES CORRECTA
df_seasonal <- df_seasonal %>%
mutate(
seasonal_c = s_unadj / mean(s_unadj, na.rm = TRUE) # Normalize seasonal component
) %>%
select(n_month, seasonal_c)
#
```
$$
\begin{align*}
\sum_{i=1}^{m}S_i = \sum_{i=1}^{m}(S_{unadj\text{ i}} + x_i) = \sum_{i=1}^{m}(S_{unadj\text{ i}}) + \sum_{i=1}^{m}(m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot (1-\frac{a}{m})) = \\ a + \sum_{i=1}^{m}(m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot (1-\frac{a}{m}))
\end{align*}
$$
Considering that $a$, $m$ and $(1-\frac{a}{m})$ are constants, we may take them out of the sum. Continuing where we left off:
$$
\begin{align*}
\sum_{i=1}^{m}S_i = a + \sum_{i=1}^{m}(m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot (1-\frac{a}{m})) = a + \frac{m}{a}(1-\frac{a}{m})\cdot \sum_{i=1}^{m}S_{unadj\text{ i}} = a + \frac{m}{a}(1-\frac{a}{m})\cdot a = \\ a + \frac{m}{a} \cdot a - \frac{m}{a} \cdot \frac{a}{m} \cdot a = a + m - a = m
\end{align*}
$$
In conclusion, the correction $x_i$ to be added to each $S_{unadj\text{ i}}$ to obtain the corrected $S_i$ is:
$$
x_i = m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot x = m \cdot \frac{S_{unadj\text{ i}}}{a} \cdot (1-\frac{a}{m})
$$
And the adjusted seasonal component $S$ will be computed out of $S_{unadj}$ as follows
$$
S_i = S_{unadj\text{ i}} + x_i
$$
* **Output**: the adjusted seasonal component $S$. **Check at this point that indeed $\sum_{i=1}^{m}S_i=m$**
## Step 4 - Compute the remainder component
$R_t = \frac{y_t}{T_tS_t} = \frac{D_t}{S_t}$
# Exercise 1
Consider the last five years of the Gas data from `aus_production`
```{r}
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas
```
a. Use `classical_decomposition` with `type=multiplicative` to calculate the trend-cycle and seasonal components
```{r, include = params$print_sol}
# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <-
gas %>%
model(
c_mult = classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
classical_dec
```
b. Compute and plot the seasonally adjusted data.
```{r, include=params$print_sol}
classical_dec %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas production (Petajoules)",
title = "Australian gas production")
```
c. Compute manually the classical decomposition of the series (as we did in class, but keep in mind that this scheme is multiplicative and that now the seasonal period is 4 instead of 12).
```{r, include=params$print_sol}
### 1. Compute the trend with a 2x4 MA
# Compute moving averages
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- gas %>%
mutate(
`4-MA` = slider::slide_dbl(Gas, mean,
.before = 1, .after = 2, .complete = TRUE),
`2x4-MA` = slider::slide_dbl(`4-MA`, mean,
.before = 1, .after = 0, .complete = TRUE)
)
# Plot the computed trend
manual_decomposition %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y = `2x4-MA`), colour = "#D55E00") +
labs(y = "Gas production (Petajoules)",
title = "Australian gas production")
# Check
all.equal(classical_dec$trend, manual_decomposition$`2x4-MA`)
```
```{r, include=params$print_sol}
### 2. Compute the de-trended component:
# Compute the detrended component:
manual_decomposition <- manual_decomposition %>%
mutate(detrended = Gas / `2x4-MA`,
n_quarter = quarter(Quarter)) # Add an identifier for the month (used later for a join statement)
manual_decomposition %>% autoplot(detrended)
```
```{r, include=params$print_sol}
### 3. Compute the unadjustedseasonal component grouping and
# averaging the detrended component
res <- manual_decomposition %>%
index_by(n_quarter) %>%
#Compute seasonal component
summarise(
S_unadj = mean(detrended, na.rm = TRUE)
)
res
```
```{r, include=params$print_sol}
### 4. Correct the seasonal components so that they add to m = 4
m = 4
a = sum(res$S_unadj)
x_i = res$S_unadj * (m/a) * (1 - a/m) # Vector of corrections
res <- res %>%
# Correction so that the sum of the adjusted seasonal adds up to 4
mutate(seasonal_c = S_unadj + x_i)
# Check that the corrected seasonal component adds up to 4
sum(res$seasonal_c)
```
```{r, include=params$print_sol}
#Check the aspect of the obtained seasonal component
res %>%
autoplot(seasonal_c) +
geom_point(color="red")
```
```{r, include=params$print_sol}
# Bring seasonal component into the dataset
# (I chose to use a "join" statement, very convenient)
manual_decomposition <- left_join(manual_decomposition, res, by = "n_quarter")
manual_decomposition
```
```{r, include=params$print_sol}
# Compute remainder component:
manual_decomposition <-
manual_decomposition %>%
mutate(remainder = detrended / seasonal_c)
select(manual_decomposition, Quarter, Gas, `2x4-MA`, seasonal_c, remainder)
```
```{r, include = params$print_sol}
# Comparison of manual computations with computations from the library:
all.equal(classical_dec$trend, manual_decomposition$`2x4-MA`)
all.equal(classical_dec$seasonal, manual_decomposition$seasonal_c)
all.equal(classical_dec$random, manual_decomposition$remainder)
```
d. Change one observations to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
```{r, include=params$print_sol}
gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2007Q4"), Gas + 300, Gas)) %>%
model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")
```
```{r, include=params$print_sol, eval=FALSE}
* The outlier is more or less in the middle of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is in the middle
of the series, this information is in the region where the moving average can
be computed and the information of the outlier is captured by the trend component.
* The detrended component also inherits the information of the outlier,
since it is computed as $y_t/T_t$.
* The seasonal component is computed averaging this detrended component and
therefore is also affected by the outlier. It will not accurately approximate
the seasonal component.
```
e. Does it make any difference if the outlier is near the end rather than in the middle of the series?
```{r, include=params$print_sol, eval=FALSE}
As a result the seasonally adjusted data still has seasonality, because we have
not adequately captured the seasonal component.
```
e. Does it make any difference if the outlier is near the end rather than in the middle of the series?
```{r, include=params$print_sol, eval=FALSE}
gas %>%
mutate(Gas = if_else(Quarter == yearquarter("2010Q2"), Gas + 300, Gas)) %>%
model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust) +
labs(title = "Seasonally adjusted data", y = "Petajoules")
```
```{r, include=params$print_sol, eval=FALSE}
* The outlier is located at the last point of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is at the very
end of the series, this information is in the region where the 2x4 moving
average cannot be computed and the information of the outlier is not captured
by the trend component.
* The detrended component is defined in exactly the same region as the trend
approximation, since it is computed as $y_t/T_t$. Therefore the detrended
component does not capture the information about the outlier.
* The seasonal component is computed averaging this detrended component.
Therefore it does not inherit information from the outlier. The seasonal
component is not affected by the presence of this outlier.
```