Statistics computed on time series data are called time series features.
library(pacman)
p_load(tidyverse, fpp3)
We will examine the time series data in the Tourism dateset.
data(tourism)
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # … with 24,310 more rows
tourism %>% features(Trips, quantile)
## # A tibble: 304 x 8
## Region State Purpose `0%` `25%` `50%` `75%` `100%`
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adelaide South Australia Busine… 68.7 134. 153. 177. 242.
## 2 Adelaide South Australia Holiday 108. 135. 154. 172. 224.
## 3 Adelaide South Australia Other 25.9 43.9 53.8 62.5 107.
## 4 Adelaide South Australia Visiti… 137. 179. 206. 229. 270.
## 5 Adelaide Hills South Australia Busine… 0 0 1.26 3.92 28.6
## 6 Adelaide Hills South Australia Holiday 0 5.77 8.52 14.1 35.8
## 7 Adelaide Hills South Australia Other 0 0 0.908 2.09 8.95
## 8 Adelaide Hills South Australia Visiti… 0.778 8.91 12.2 16.8 81.1
## 9 Alice Springs Northern Territo… Busine… 1.01 9.13 13.3 18.5 34.1
## 10 Alice Springs Northern Territo… Holiday 2.81 16.9 31.5 44.8 76.5
## # … with 294 more rows
Note the list can be used for many statistics.
tourism %>% features(Trips, list(mean = mean, median = median, sd = sd, min = min, max = max))
## # A tibble: 304 x 8
## Region State Purpose mean median sd min max
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adelaide South Australia Busine… 156. 153. 35.6 68.7 242.
## 2 Adelaide South Australia Holiday 157. 154. 27.1 108. 224.
## 3 Adelaide South Australia Other 56.6 53.8 17.3 25.9 107.
## 4 Adelaide South Australia Visiti… 205. 206. 32.5 137. 270.
## 5 Adelaide Hills South Australia Busine… 2.66 1.26 4.30 0 28.6
## 6 Adelaide Hills South Australia Holiday 10.5 8.52 6.37 0 35.8
## 7 Adelaide Hills South Australia Other 1.40 0.908 1.65 0 8.95
## 8 Adelaide Hills South Australia Visiti… 14.2 12.2 10.7 0.778 81.1
## 9 Alice Springs Northern Territory Busine… 14.6 13.3 7.20 1.01 34.1
## 10 Alice Springs Northern Territory Holiday 31.9 31.5 18.1 2.81 76.5
## # … with 294 more rows
Lets look at the first year of the data.
tourism %>% filter(Quarter <= yearquarter("1998 Q4")) %>%
features(Trips, quantile)
## # A tibble: 304 x 8
## Region State Purpose `0%` `25%` `50%` `75%` `100%`
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adelaide South Australia Busine… 110. 123. 131. 143. 166.
## 2 Adelaide South Australia Holiday 130. 150. 169. 193. 224.
## 3 Adelaide South Australia Other 33.8 37.3 39.0 44.2 58.4
## 4 Adelaide South Australia Visiti… 170. 178. 207. 235. 242.
## 5 Adelaide Hills South Australia Busine… 0 0 0 1.85 7.42
## 6 Adelaide Hills South Australia Holiday 6.81 11.4 13.3 14.8 18.1
## 7 Adelaide Hills South Australia Other 0 0.581 1.66 3.08 4.64
## 8 Adelaide Hills South Australia Visiti… 2.99 3.44 5.68 7.90 8.35
## 9 Alice Springs Northern Territory Busine… 3.36 3.82 5.76 11.1 21.8
## 10 Alice Springs Northern Territory Holiday 8.15 22.5 30.9 45.1 76.5
## # … with 294 more rows
The correlations in time are all time series features.
tourism %>% ACF()
## Response variable not specified, automatically selected `var = Trips`
## # A tsibble: 5,776 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Region State Purpose lag acf
## <chr> <chr> <chr> <lag> <dbl>
## 1 Adelaide South Australia Business 1Q 0.0333
## 2 Adelaide South Australia Business 2Q 0.0590
## 3 Adelaide South Australia Business 3Q 0.0536
## 4 Adelaide South Australia Business 4Q 0.201
## 5 Adelaide South Australia Business 5Q 0.0645
## 6 Adelaide South Australia Business 6Q 0.104
## 7 Adelaide South Australia Business 7Q -0.0556
## 8 Adelaide South Australia Business 8Q 0.227
## 9 Adelaide South Australia Business 9Q 0.0128
## 10 Adelaide South Australia Business 10Q -0.114
## # … with 5,766 more rows
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
ACF()
## Response variable not specified, automatically selected `var = Trips`
## # A tsibble: 19 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Region State Purpose lag acf
## <chr> <chr> <chr> <lag> <dbl>
## 1 Adelaide South Australia Holiday 1Q 0.0456
## 2 Adelaide South Australia Holiday 2Q -0.143
## 3 Adelaide South Australia Holiday 3Q 0.0976
## 4 Adelaide South Australia Holiday 4Q 0.351
## 5 Adelaide South Australia Holiday 5Q 0.0642
## 6 Adelaide South Australia Holiday 6Q -0.0791
## 7 Adelaide South Australia Holiday 7Q 0.0228
## 8 Adelaide South Australia Holiday 8Q 0.364
## 9 Adelaide South Australia Holiday 9Q -0.0226
## 10 Adelaide South Australia Holiday 10Q -0.270
## 11 Adelaide South Australia Holiday 11Q -0.00270
## 12 Adelaide South Australia Holiday 12Q 0.298
## 13 Adelaide South Australia Holiday 13Q 0.0262
## 14 Adelaide South Australia Holiday 14Q -0.329
## 15 Adelaide South Australia Holiday 15Q -0.0412
## 16 Adelaide South Australia Holiday 16Q 0.161
## 17 Adelaide South Australia Holiday 17Q -0.0736
## 18 Adelaide South Australia Holiday 18Q -0.227
## 19 Adelaide South Australia Holiday 19Q -0.0970
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
ACF() %>%
autoplot()
## Response variable not specified, automatically selected `var = Trips`
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Trips`
tourism %>%
features(Trips, feat_stl)
## # A tibble: 304 x 12
## Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Adela… Sout… Busine… 0.451 0.380 3
## 2 Adela… Sout… Holiday 0.541 0.601 1
## 3 Adela… Sout… Other 0.743 0.189 2
## 4 Adela… Sout… Visiti… 0.433 0.446 1
## 5 Adela… Sout… Busine… 0.453 0.140 3
## 6 Adela… Sout… Holiday 0.512 0.244 2
## 7 Adela… Sout… Other 0.584 0.374 2
## 8 Adela… Sout… Visiti… 0.481 0.228 0
## 9 Alice… Nort… Busine… 0.526 0.224 0
## 10 Alice… Nort… Holiday 0.377 0.827 3
## # … with 294 more rows, and 6 more variables: seasonal_trough_year <dbl>,
## # spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
## # stl_e_acf10 <dbl>
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
features(Trips, feat_stl)
## # A tibble: 1 x 12
## Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Adela… Sout… Holiday 0.541 0.601 1
## # … with 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## # linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.
tourism %>%
features(Trips, feat_stl) %>%
ggplot(aes(x = trend_strength, y = seasonal_strength_year, col = Purpose)) +
geom_point() +
facet_wrap(vars(State))
Find the year with the maximum seasonal strength.
tourism %>%
features(Trips, feat_stl) %>%
filter(seasonal_strength_year == max(seasonal_strength_year)) %>%
left_join(tourism, by = c("State", "Region", "Purpose")) %>%
ggplot(aes(x = Quarter, y = Trips)) +
geom_line() +
facet_grid(vars(State, Region, Purpose))
tourism_features <- tourism %>%
features(Trips, feature_set(pkgs = "feasts"))
## Warning: `n_flat_spots()` is deprecated as of feasts 0.1.5.
## Please use `longest_flat_spot()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
tourism_features
## # A tibble: 304 x 51
## Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Adela… Sout… Busine… 0.451 0.380 3
## 2 Adela… Sout… Holiday 0.541 0.601 1
## 3 Adela… Sout… Other 0.743 0.189 2
## 4 Adela… Sout… Visiti… 0.433 0.446 1
## 5 Adela… Sout… Busine… 0.453 0.140 3
## 6 Adela… Sout… Holiday 0.512 0.244 2
## 7 Adela… Sout… Other 0.584 0.374 2
## 8 Adela… Sout… Visiti… 0.481 0.228 0
## 9 Alice… Nort… Busine… 0.526 0.224 0
## 10 Alice… Nort… Holiday 0.377 0.827 3
## # … with 294 more rows, and 45 more variables: seasonal_trough_year <dbl>,
## # spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
## # stl_e_acf10 <dbl>, acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>,
## # diff1_acf10 <dbl>, diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>,
## # pacf5 <dbl>, diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>,
## # zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>,
## # zero_end_prop <dbl>, lambda_guerrero <dbl>, kpss_stat <dbl>,
## # kpss_pvalue <dbl>, pp_stat <dbl>, pp_pvalue <dbl>, ndiffs <int>,
## # nsdiffs <int>, bp_stat <dbl>, bp_pvalue <dbl>, lb_stat <dbl>,
## # lb_pvalue <dbl>, var_tiled_var <dbl>, var_tiled_mean <dbl>,
## # shift_level_max <dbl>, shift_level_index <dbl>, shift_var_max <dbl>,
## # shift_var_index <dbl>, shift_kl_max <dbl>, shift_kl_index <dbl>,
## # spectral_entropy <dbl>, n_crossing_points <int>, longest_flat_spot <int>,
## # coef_hurst <dbl>, stat_arch_lm <dbl>
For the homework we will look at the PBS data.
PBS %>% features(Cost, list(mean = mean, sd = sd))
## # A tibble: 336 x 6
## Concession Type ATC1 ATC2 mean sd
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Concessional Co-payments A A01 67673. 14763.
## 2 Concessional Co-payments A A02 16455044. 7498596.
## 3 Concessional Co-payments A A03 476221. 370696.
## 4 Concessional Co-payments A A04 463392. 154020.
## 5 Concessional Co-payments A A05 147604. 74190.
## 6 Concessional Co-payments A A06 417889. 163040.
## 7 Concessional Co-payments A A07 917795. 338081.
## 8 Concessional Co-payments A A09 343881. 89815.
## 9 Concessional Co-payments A A10 5680010. 3180294.
## 10 Concessional Co-payments A A11 651863. 387439.
## # … with 326 more rows