Seasonal Plots

Author

Eric A. Suess

Published

January 25, 2023

Today we are going to take a look at a number of the time series datasets, tibbles, that are used in the fpp3 book that show seasonal patterns.

A seasonal pattern is one that is exhibited over and over again at regular intervals.

library(fpp3)
── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
✔ tibble      3.1.7     ✔ tsibble     1.1.1
✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
✔ tidyr       1.2.0     ✔ feasts      0.2.2
✔ lubridate   1.8.0     ✔ fable       0.3.1
✔ ggplot2     3.3.6     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()

The book the author used the following code to create the a10 tsibble.

Most poeple consider the code here to be “sort of bad” code. Why?

Answer: The assignment to a new object in R is usually done at the start of a pipeline of code not at the end. It is easy to missing the assignment at the end.

PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(TotalC = sum(Cost)) %>%
  mutate(Cost = TotalC / 1e6) -> a10

Beware of the is in the book. It may be hard to find code your are looking for because of this. Better practice.

a10 <- PBS %>%
  filter(ATC2 == "A10") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(TotalC = sum(Cost)) %>%
  mutate(Cost = TotalC / 1e6)    # 1e6 = 1000000
a10
# A tsibble: 204 x 3 [1M]
      Month  TotalC  Cost
      <mth>   <dbl> <dbl>
 1 1991 Jul 3526591  3.53
 2 1991 Aug 3180891  3.18
 3 1991 Sep 3252221  3.25
 4 1991 Oct 3611003  3.61
 5 1991 Nov 3565869  3.57
 6 1991 Dec 4306371  4.31
 7 1992 Jan 5088335  5.09
 8 1992 Feb 2814520  2.81
 9 1992 Mar 2985811  2.99
10 1992 Apr 3204780  3.20
# … with 194 more rows
a10 %>% autoplot()
Plot variable not specified, automatically selected `.vars = TotalC`

write.csv(a10, "a10.csv")
a10 %>% gg_season()  # plots first column
Plot variable not specified, automatically selected `y = TotalC`

a10 %>%
  gg_season(Cost, labels = "both") +
  labs(y = "$ million",
       title = "Seasonal plot: antidiabetic drug sales")

a10 %>% gg_subseries()  # plots first column
Plot variable not specified, automatically selected `y = TotalC`

a10 %>%
  gg_subseries(Cost) +
  labs(y = "$ million",
       title = "Seasonal subseries plot: antidiabetic drug sales")

Notice that there is an upward trend in the data. So the year is an important predictor of the TotalC.

Also notice the effect of the tend on the ACF.

a10 %>% gg_tsdisplay(TotalC)

Tourism

Filter the data to only look at trips where the purpose was a Holiday.

holidays <- tourism %>%
  filter(Purpose == "Holiday") %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips))

holidays
# A tsibble: 640 x 3 [1Q]
# Key:       State [8]
   State Quarter Trips
   <chr>   <qtr> <dbl>
 1 ACT   1998 Q1  196.
 2 ACT   1998 Q2  127.
 3 ACT   1998 Q3  111.
 4 ACT   1998 Q4  170.
 5 ACT   1999 Q1  108.
 6 ACT   1999 Q2  125.
 7 ACT   1999 Q3  178.
 8 ACT   1999 Q4  218.
 9 ACT   2000 Q1  158.
10 ACT   2000 Q2  155.
# … with 630 more rows
holidays %>% autoplot(Trips) +
  labs(y = "thousands of trips",
       title = "Australian domestic holiday nights")

Compare two of the time series.

Victory is very well behaved Quarterly data.

holidays %>% filter(State == "Victoria") %>% 
  gg_tsdisplay()
Plot variable not specified, automatically selected `y = Trips`

holidays %>% filter(State == "Victoria") %>% write.csv("victoria.csv")

holidays %>% filter(State == "Victoria") %>% ACF()
Response variable not specified, automatically selected `var = Trips`
# A tsibble: 19 x 3 [1Q]
# Key:       State [1]
   State      lag      acf
   <chr>    <lag>    <dbl>
 1 Victoria    1Q  0.00755
 2 Victoria    2Q -0.452  
 3 Victoria    3Q  0.0374 
 4 Victoria    4Q  0.828  
 5 Victoria    5Q -0.0305 
 6 Victoria    6Q -0.463  
 7 Victoria    7Q  0.0289 
 8 Victoria    8Q  0.730  
 9 Victoria    9Q -0.0735 
10 Victoria   10Q -0.442  
11 Victoria   11Q -0.00197
12 Victoria   12Q  0.660  
13 Victoria   13Q -0.0687 
14 Victoria   14Q -0.422  
15 Victoria   15Q -0.0160 
16 Victoria   16Q  0.594  
17 Victoria   17Q -0.0975 
18 Victoria   18Q -0.426  
19 Victoria   19Q -0.0346 
holidays %>% filter(State == "Victoria") %>% ACF() %>% autoplot()
Response variable not specified, automatically selected `var = Trips`

ACT is not so well behaved.

holidays %>% filter(State == "ACT") %>% 
  gg_tsdisplay()
Plot variable not specified, automatically selected `y = Trips`

Scatterplots

Now, summing over all of the different Purposes within the States.

visitors <- tourism %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips))

visitors 
# A tsibble: 640 x 3 [1Q]
# Key:       State [8]
   State Quarter Trips
   <chr>   <qtr> <dbl>
 1 ACT   1998 Q1  551.
 2 ACT   1998 Q2  416.
 3 ACT   1998 Q3  436.
 4 ACT   1998 Q4  450.
 5 ACT   1999 Q1  379.
 6 ACT   1999 Q2  558.
 7 ACT   1999 Q3  449.
 8 ACT   1999 Q4  595.
 9 ACT   2000 Q1  600.
10 ACT   2000 Q2  557.
# … with 630 more rows

Consider the cross-correlation between the different time series from different States.

visitors %>%
  ggplot(aes(x = Quarter, y = Trips)) +
  geom_line() +
  facet_grid(vars(State), scales = "free_y") +
  labs(y = "Number of visitor nights each quarter (millions)")

The ggpairs() function assumes the different time series are down separate columns. Note that is changes a tidy tsibble into a non-tidy tsibble.

visitors %>%
  pivot_wider(values_from=Trips, names_from=State) %>%
  GGally::ggpairs(columns = 2:9)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Now consider the autocorrelation with since time series.

Can you see the autocorrelation every 4 Quarters in the Victoria time series?

visitors %>%
  pivot_wider(values_from=Trips, names_from=State) %>% 
  select("Victoria") %>% 
  gg_lag(geom = "point")
Plot variable not specified, automatically selected `y = Victoria`

visitors %>%
  pivot_wider(values_from=Trips, names_from=State) %>% 
  select("Victoria") %>% 
  ACF() %>% 
  autoplot()
Response variable not specified, automatically selected `var = Victoria`

Can you see that there is not so clear of a seasonal pattern in the ACT time series.

visitors %>%
  pivot_wider(values_from=Trips, names_from=State) %>% 
  select("ACT") %>% 
  gg_lag(geom = "point")
Plot variable not specified, automatically selected `y = ACT`

visitors %>%
  pivot_wider(values_from=Trips, names_from=State) %>% 
  select("ACT") %>% 
  ACF() %>% 
  autoplot()
Response variable not specified, automatically selected `var = ACT`