library(pacman)
p_load(tidyverse, fpp3)
Stat. 674 Final
This is a take-home Final Exam. You may ask questions of the Professor or search Google. This is exam is to be completed independently.
Do the following:
- From Section 7.10 Exercise 4
- From Section 9.11 Exercise 7a, 7b
- From Section 12.6 Exercise 3, try NNAR on us_employment from Chapter 9 Exercise 10
Section 7.10 Exercise 4
The data set
souvenirs
concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
- Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
Answer:
Write your answer here.
|> autoplot(Sales) souvenirs
Features of the data:
- Seasonal data – similar scaled pattern repeats every year
- A spike every March (except for year 1987) is the influence of the surfing festival
- The size of the pattern increases proportionally to the level of sales
- Explain why it is necessary to take logarithms of these data before fitting a model.
Answer:
Write your answer here.
# Taking logarithms of the data
|> autoplot(log(Sales)) souvenirs
After taking logs, the trend looks more linear and the seasonal variation is roughly constant.
- Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
Answer:
Write your answer here.
<- souvenirs |>
fit mutate(festival = month(Month) == 3 & year(Month) != 1987) |>
model(reg = TSLM(log(Sales) ~ trend() + season() + festival))
|>
souvenirs autoplot(Sales, col = "gray") +
geom_line(data = augment(fit), aes(y = .fitted), col = "blue")
- Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
Answer:
Write your answer here.
|> gg_tsresiduals() fit
The residuals are serially correlated. This is both visible from the time plot but also from the ACF. The residuals reveal nonlinearity in the trend.
augment(fit) |>
ggplot(aes(x = .fitted, y = .innov)) +
geom_point() +
scale_x_log10()
The plot of residuals against fitted values looks fine - no notable patterns emerge. We take logarithms of fitted values because we took logs in the model.
- Do boxplots of the residuals for each month. Does this reveal any problems with the model?
Answer:
Write your answer here.
augment(fit) |>
mutate(month = month(Month, label = TRUE)) |>
ggplot(aes(x = month, y = .innov)) +
geom_boxplot()
The boxplots show differences in variation across the months revealing some potential heteroscedasticity.
- What do the values of the coefficients tell you about each variable?
Answer:
Write your answer here.
tidy(fit) |> mutate(pceffect = (exp(estimate) - 1) * 100)
# A tibble: 14 × 7
.model term estimate std.error statistic p.value pceffect
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 reg (Intercept) 7.62 0.0742 103. 4.67e-78 203688.
2 reg trend() 0.0220 0.000827 26.6 2.32e-38 2.23
3 reg season()year2 0.251 0.0957 2.63 1.06e- 2 28.6
4 reg season()year3 0.266 0.193 1.38 1.73e- 1 30.5
5 reg season()year4 0.384 0.0957 4.01 1.48e- 4 46.8
6 reg season()year5 0.409 0.0957 4.28 5.88e- 5 50.6
7 reg season()year6 0.449 0.0958 4.69 1.33e- 5 56.6
8 reg season()year7 0.610 0.0958 6.37 1.71e- 8 84.1
9 reg season()year8 0.588 0.0959 6.13 4.53e- 8 80.0
10 reg season()year9 0.669 0.0959 6.98 1.36e- 9 95.3
11 reg season()year10 0.747 0.0960 7.79 4.48e-11 111.
12 reg season()year11 1.21 0.0960 12.6 1.29e-19 234.
13 reg season()year12 1.96 0.0961 20.4 3.39e-31 612.
14 reg festivalTRUE 0.502 0.196 2.55 1.29e- 2 65.1
(Intercept)
is not interpretable.trend
coefficient shows that with every month sales increases on average by 2.2%.season2
coefficient shows that February sales are greater than January on average by 28.6%, after allowing for the trend.- …
season12
coefficient shows that December sales are greater than January on average by 611.5%, after allowing for the trend.festivalTRUE
coefficient shows that for months that include the surfing festival, sales increases on average by 65.1% compared to months without the festival, after allowing for the trend and seasonality.
- What does the Ljung-Box test tell you about your model?
Answer:
Write your answer here.
augment(fit) %>%
features(.innov, ljung_box, lag = 24)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 reg 112. 2.15e-13
The serial correlation in the residuals is significant.
- Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
<- new_data(souvenirs, n = 36) |>
future_souvenirs mutate(festival = month(Month) == 3)
|>
fit forecast(new_data = future_souvenirs) |>
autoplot(souvenirs)
- How could you improve these predictions by modifying the model?
Answer:
Write your answer here.
Section 9.11 Exercise 7a, 7b
Consider
aus_airpassengers
, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
- Use
ARIMA()
to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
Answer:
Write your answer here.
%>% autoplot(Passengers) aus_airpassengers
<- aus_airpassengers %>%
fit model(arima = ARIMA(Passengers))
report(fit)
Series: Passengers
Model: ARIMA(0,2,1)
Coefficients:
ma1
-0.8963
s.e. 0.0594
sigma^2 estimated as 4.308: log likelihood=-97.02
AIC=198.04 AICc=198.32 BIC=201.65
%>% gg_tsresiduals() fit
|> forecast(h = 10) |> autoplot(aus_airpassengers) fit
- Write the model in terms of the backshift operator.
Answer:
Write your answer here.
[1] -0.8962564
year
4.307764
\[(1-B)^2y_t = (1+\theta B)\varepsilon_t\] where \(\varepsilon_t \sim\text{N}(0,\sigma^2)\), \(\theta = -0.90\) and \(\sigma^2 = 4.31\).
Section 12.6 Exercise 3
Experiment with using NNETAR() on your retail data and other data we have considered in previous chapters.
Use the data set souvenirs
concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. See the first problem, Section 7.10 Exercise 4
Answer:
Write your answer here.