Stat. 674 Final

Author

Prof. Eric A. Suess

Published

March 2, 2025

This is a take-home Final Exam. You may ask questions of the Professor or search Google. This is exam is to be completed independently.

Do the following:

library(pacman)
p_load(tidyverse, fpp3)

Section 7.10 Exercise 4

The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

  1. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

Answer:

Write your answer here.

souvenirs |> autoplot(Sales)

Features of the data:

  • Seasonal data – similar scaled pattern repeats every year
  • A spike every March (except for year 1987) is the influence of the surfing festival
  • The size of the pattern increases proportionally to the level of sales
  1. Explain why it is necessary to take logarithms of these data before fitting a model.

Answer:

Write your answer here.

# Taking logarithms of the data
souvenirs |> autoplot(log(Sales))

After taking logs, the trend looks more linear and the seasonal variation is roughly constant.

  1. Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

Answer:

Write your answer here.

fit <- souvenirs |> 
  mutate(festival = month(Month) == 3 & year(Month) != 1987) |> 
  model(reg = TSLM(log(Sales) ~ trend() + season() + festival))
souvenirs |> 
  autoplot(Sales, col = "gray") +
  geom_line(data = augment(fit), aes(y = .fitted), col = "blue")

  1. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

Answer:

Write your answer here.

fit |> gg_tsresiduals()

The residuals are serially correlated. This is both visible from the time plot but also from the ACF. The residuals reveal nonlinearity in the trend.

augment(fit) |> 
  ggplot(aes(x = .fitted, y = .innov)) +
  geom_point() +
  scale_x_log10()

The plot of residuals against fitted values looks fine - no notable patterns emerge. We take logarithms of fitted values because we took logs in the model.

  1. Do boxplots of the residuals for each month. Does this reveal any problems with the model?

Answer:

Write your answer here.

augment(fit) |> 
  mutate(month = month(Month, label = TRUE)) |> 
  ggplot(aes(x = month, y = .innov)) +
  geom_boxplot()

The boxplots show differences in variation across the months revealing some potential heteroscedasticity.

  1. What do the values of the coefficients tell you about each variable?

Answer:

Write your answer here.

tidy(fit) |>  mutate(pceffect = (exp(estimate) - 1) * 100)
# A tibble: 14 × 7
   .model term           estimate std.error statistic  p.value  pceffect
   <chr>  <chr>             <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
 1 reg    (Intercept)      7.62    0.0742      103.   4.67e-78 203688.  
 2 reg    trend()          0.0220  0.000827     26.6  2.32e-38      2.23
 3 reg    season()year2    0.251   0.0957        2.63 1.06e- 2     28.6 
 4 reg    season()year3    0.266   0.193         1.38 1.73e- 1     30.5 
 5 reg    season()year4    0.384   0.0957        4.01 1.48e- 4     46.8 
 6 reg    season()year5    0.409   0.0957        4.28 5.88e- 5     50.6 
 7 reg    season()year6    0.449   0.0958        4.69 1.33e- 5     56.6 
 8 reg    season()year7    0.610   0.0958        6.37 1.71e- 8     84.1 
 9 reg    season()year8    0.588   0.0959        6.13 4.53e- 8     80.0 
10 reg    season()year9    0.669   0.0959        6.98 1.36e- 9     95.3 
11 reg    season()year10   0.747   0.0960        7.79 4.48e-11    111.  
12 reg    season()year11   1.21    0.0960       12.6  1.29e-19    234.  
13 reg    season()year12   1.96    0.0961       20.4  3.39e-31    612.  
14 reg    festivalTRUE     0.502   0.196         2.55 1.29e- 2     65.1 
  • (Intercept) is not interpretable.
  • trend coefficient shows that with every month sales increases on average by 2.2%.
  • season2 coefficient shows that February sales are greater than January on average by 28.6%, after allowing for the trend.
  • season12 coefficient shows that December sales are greater than January on average by 611.5%, after allowing for the trend.
  • festivalTRUE coefficient shows that for months that include the surfing festival, sales increases on average by 65.1% compared to months without the festival, after allowing for the trend and seasonality.
  1. What does the Ljung-Box test tell you about your model?

Answer:

Write your answer here.

augment(fit) %>%
  features(.innov, ljung_box, lag = 24)
# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 reg       112.  2.15e-13

The serial correlation in the residuals is significant.

  1. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
future_souvenirs <- new_data(souvenirs, n = 36) |> 
  mutate(festival = month(Month) == 3)
fit |> 
  forecast(new_data = future_souvenirs) |> 
  autoplot(souvenirs)

  1. How could you improve these predictions by modifying the model?

Answer:

Write your answer here.

Section 9.11 Exercise 7a, 7b

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

  1. Use ARIMA() to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

Answer:

Write your answer here.

aus_airpassengers %>% autoplot(Passengers)

fit <- aus_airpassengers %>%
  model(arima = ARIMA(Passengers))
report(fit)
Series: Passengers 
Model: ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.8963
s.e.   0.0594

sigma^2 estimated as 4.308:  log likelihood=-97.02
AIC=198.04   AICc=198.32   BIC=201.65
fit %>% gg_tsresiduals()

fit |>  forecast(h = 10) |>  autoplot(aus_airpassengers)

  1. Write the model in terms of the backshift operator.

Answer:

Write your answer here.

[1] -0.8962564
    year 
4.307764 

\[(1-B)^2y_t = (1+\theta B)\varepsilon_t\] where \(\varepsilon_t \sim\text{N}(0,\sigma^2)\), \(\theta = -0.90\) and \(\sigma^2 = 4.31\).

Section 12.6 Exercise 3

Experiment with using NNETAR() on your retail data and other data we have considered in previous chapters.

Use the data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. See the first problem, Section 7.10 Exercise 4

Answer:

Write your answer here.