--- title: "Seasonal Plots" author: "Eric A. Suess" date: "2/1/2021" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Today we are going to take a look at a number of the time series datasets, tibbles, that are used in the [fpp3](https://otexts.com/fpp3/) book that show seasonal patterns. A seasonal pattern is one that is exhibited over and over again at regular intervals. ```{r} library(fpp3) ``` The book the author used the following code to create the *a10* tsibble. Most poeple consider the code here to be "sort of bad" code. Why? Answer: The assignment to a new object in R is usually done at the start of a pipeline of code not at the end. It is easy to missing the assignment at the end. ```{r} PBS %>% filter(ATC2 == "A10") %>% select(Month, Concession, Type, Cost) %>% summarise(TotalC = sum(Cost)) %>% mutate(Cost = TotalC / 1e6) -> a10 ``` Beware of the is in the book. It may be hard to find code your are looking for because of this. Better practice. ```{r} a10 <- PBS %>% filter(ATC2 == "A10") %>% select(Month, Concession, Type, Cost) %>% summarise(TotalC = sum(Cost)) %>% mutate(Cost = TotalC / 1e6) # 1e6 = 1000000 a10 ``` ```{r} a10 %>% autoplot() ``` ```{r} write.csv(a10, "a10.csv") ``` ```{r} a10 %>% gg_season() # plots first column ``` ```{r} a10 %>% gg_season(Cost, labels = "both") + labs(y = "$ million", title = "Seasonal plot: antidiabetic drug sales") ``` ```{r} a10 %>% gg_subseries() # plots first column ``` ```{r} a10 %>% gg_subseries(Cost) + labs(y = "$ million", title = "Seasonal subseries plot: antidiabetic drug sales") ``` Notice that there is an upward trend in the data. So the year is an important predictor of the TotalC. Also notice the effect of the tend on the ACF. ```{r} a10 %>% gg_tsdisplay(TotalC) ``` # Tourism Filter the data to only look at trips where the purpose was a Holiday. ```{r} holidays <- tourism %>% filter(Purpose == "Holiday") %>% group_by(State) %>% summarise(Trips = sum(Trips)) holidays ``` ```{r} holidays %>% autoplot(Trips) + labs(y = "thousands of trips", title = "Australian domestic holiday nights") ``` Compare two of the time series. Victory is very well behaved Quarterly data. ```{r} holidays %>% filter(State == "Victoria") %>% gg_tsdisplay() ``` ```{r} holidays %>% filter(State == "Victoria") %>% write.csv("victoria.csv") holidays %>% filter(State == "Victoria") %>% ACF() holidays %>% filter(State == "Victoria") %>% ACF() %>% autoplot() ``` ACT is not so well behaved. ```{r} holidays %>% filter(State == "ACT") %>% gg_tsdisplay() ``` # Scatterplots Now, summing over all of the different Purposes within the States. ```{r} visitors <- tourism %>% group_by(State) %>% summarise(Trips = sum(Trips)) visitors ``` Consider the cross-correlation between the different time series from different States. ```{r} visitors %>% ggplot(aes(x = Quarter, y = Trips)) + geom_line() + facet_grid(vars(State), scales = "free_y") + labs(y = "Number of visitor nights each quarter (millions)") ``` The ggpairs() function assumes the different time series are down separate columns. Note that is changes a tidy tsibble into a non-tidy tsibble. ```{r, warning=FALSE} visitors %>% pivot_wider(values_from=Trips, names_from=State) %>% GGally::ggpairs(columns = 2:9) ``` Now consider the autocorrelation with since time series. Can you see the autocorrelation every 4 Quarters in the Victoria time series? ```{r} visitors %>% pivot_wider(values_from=Trips, names_from=State) %>% select("Victoria") %>% gg_lag(geom = "point") ``` ```{r} visitors %>% pivot_wider(values_from=Trips, names_from=State) %>% select("Victoria") %>% ACF() %>% autoplot() ``` Can you see that there is not so clear of a seasonal pattern in the ACT time series. ```{r} visitors %>% pivot_wider(values_from=Trips, names_from=State) %>% select("ACT") %>% gg_lag(geom = "point") ``` ```{r} visitors %>% pivot_wider(values_from=Trips, names_from=State) %>% select("ACT") %>% ACF() %>% autoplot() ```