Time Series Features

Statistics computed on time series data are called time series features.

library(pacman)
p_load(tidyverse, fpp3)

We will examine the time series data in the Tourism dateset.

data(tourism)
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # … with 24,310 more rows
tourism %>% features(Trips, quantile)
## # A tibble: 304 x 8
##    Region         State             Purpose    `0%`  `25%`   `50%`  `75%` `100%`
##    <chr>          <chr>             <chr>     <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
##  1 Adelaide       South Australia   Busine…  68.7   134.   153.    177.   242.  
##  2 Adelaide       South Australia   Holiday 108.    135.   154.    172.   224.  
##  3 Adelaide       South Australia   Other    25.9    43.9   53.8    62.5  107.  
##  4 Adelaide       South Australia   Visiti… 137.    179.   206.    229.   270.  
##  5 Adelaide Hills South Australia   Busine…   0       0      1.26    3.92  28.6 
##  6 Adelaide Hills South Australia   Holiday   0       5.77   8.52   14.1   35.8 
##  7 Adelaide Hills South Australia   Other     0       0      0.908   2.09   8.95
##  8 Adelaide Hills South Australia   Visiti…   0.778   8.91  12.2    16.8   81.1 
##  9 Alice Springs  Northern Territo… Busine…   1.01    9.13  13.3    18.5   34.1 
## 10 Alice Springs  Northern Territo… Holiday   2.81   16.9   31.5    44.8   76.5 
## # … with 294 more rows

Note the list can be used for many statistics.

tourism %>% features(Trips, list(mean = mean, median = median, sd = sd, min = min, max = max))
## # A tibble: 304 x 8
##    Region         State              Purpose   mean  median    sd     min    max
##    <chr>          <chr>              <chr>    <dbl>   <dbl> <dbl>   <dbl>  <dbl>
##  1 Adelaide       South Australia    Busine… 156.   153.    35.6   68.7   242.  
##  2 Adelaide       South Australia    Holiday 157.   154.    27.1  108.    224.  
##  3 Adelaide       South Australia    Other    56.6   53.8   17.3   25.9   107.  
##  4 Adelaide       South Australia    Visiti… 205.   206.    32.5  137.    270.  
##  5 Adelaide Hills South Australia    Busine…   2.66   1.26   4.30   0      28.6 
##  6 Adelaide Hills South Australia    Holiday  10.5    8.52   6.37   0      35.8 
##  7 Adelaide Hills South Australia    Other     1.40   0.908  1.65   0       8.95
##  8 Adelaide Hills South Australia    Visiti…  14.2   12.2   10.7    0.778  81.1 
##  9 Alice Springs  Northern Territory Busine…  14.6   13.3    7.20   1.01   34.1 
## 10 Alice Springs  Northern Territory Holiday  31.9   31.5   18.1    2.81   76.5 
## # … with 294 more rows

Lets look at the first year of the data.

tourism %>% filter(Quarter <= yearquarter("1998 Q4")) %>% 
  features(Trips, quantile)
## # A tibble: 304 x 8
##    Region         State              Purpose   `0%`   `25%`  `50%`  `75%` `100%`
##    <chr>          <chr>              <chr>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
##  1 Adelaide       South Australia    Busine… 110.   123.    131.   143.   166.  
##  2 Adelaide       South Australia    Holiday 130.   150.    169.   193.   224.  
##  3 Adelaide       South Australia    Other    33.8   37.3    39.0   44.2   58.4 
##  4 Adelaide       South Australia    Visiti… 170.   178.    207.   235.   242.  
##  5 Adelaide Hills South Australia    Busine…   0      0       0      1.85   7.42
##  6 Adelaide Hills South Australia    Holiday   6.81  11.4    13.3   14.8   18.1 
##  7 Adelaide Hills South Australia    Other     0      0.581   1.66   3.08   4.64
##  8 Adelaide Hills South Australia    Visiti…   2.99   3.44    5.68   7.90   8.35
##  9 Alice Springs  Northern Territory Busine…   3.36   3.82    5.76  11.1   21.8 
## 10 Alice Springs  Northern Territory Holiday   8.15  22.5    30.9   45.1   76.5 
## # … with 294 more rows

ACF features

The correlations in time are all time series features.

tourism %>% ACF()
## Response variable not specified, automatically selected `var = Trips`
## # A tsibble: 5,776 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Region   State           Purpose    lag     acf
##    <chr>    <chr>           <chr>    <lag>   <dbl>
##  1 Adelaide South Australia Business    1Q  0.0333
##  2 Adelaide South Australia Business    2Q  0.0590
##  3 Adelaide South Australia Business    3Q  0.0536
##  4 Adelaide South Australia Business    4Q  0.201 
##  5 Adelaide South Australia Business    5Q  0.0645
##  6 Adelaide South Australia Business    6Q  0.104 
##  7 Adelaide South Australia Business    7Q -0.0556
##  8 Adelaide South Australia Business    8Q  0.227 
##  9 Adelaide South Australia Business    9Q  0.0128
## 10 Adelaide South Australia Business   10Q -0.114 
## # … with 5,766 more rows
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  ACF() 
## Response variable not specified, automatically selected `var = Trips`
## # A tsibble: 19 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##    Region   State           Purpose   lag      acf
##    <chr>    <chr>           <chr>   <lag>    <dbl>
##  1 Adelaide South Australia Holiday    1Q  0.0456 
##  2 Adelaide South Australia Holiday    2Q -0.143  
##  3 Adelaide South Australia Holiday    3Q  0.0976 
##  4 Adelaide South Australia Holiday    4Q  0.351  
##  5 Adelaide South Australia Holiday    5Q  0.0642 
##  6 Adelaide South Australia Holiday    6Q -0.0791 
##  7 Adelaide South Australia Holiday    7Q  0.0228 
##  8 Adelaide South Australia Holiday    8Q  0.364  
##  9 Adelaide South Australia Holiday    9Q -0.0226 
## 10 Adelaide South Australia Holiday   10Q -0.270  
## 11 Adelaide South Australia Holiday   11Q -0.00270
## 12 Adelaide South Australia Holiday   12Q  0.298  
## 13 Adelaide South Australia Holiday   13Q  0.0262 
## 14 Adelaide South Australia Holiday   14Q -0.329  
## 15 Adelaide South Australia Holiday   15Q -0.0412 
## 16 Adelaide South Australia Holiday   16Q  0.161  
## 17 Adelaide South Australia Holiday   17Q -0.0736 
## 18 Adelaide South Australia Holiday   18Q -0.227  
## 19 Adelaide South Australia Holiday   19Q -0.0970
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  ACF() %>% 
  autoplot()
## Response variable not specified, automatically selected `var = Trips`

tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  autoplot() 
## Plot variable not specified, automatically selected `.vars = Trips`

Compute most of the features.

tourism %>%
  features(Trips, feat_stl)
## # A tibble: 304 x 12
##    Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
##    <chr>  <chr> <chr>            <dbl>            <dbl>            <dbl>
##  1 Adela… Sout… Busine…          0.451            0.380                3
##  2 Adela… Sout… Holiday          0.541            0.601                1
##  3 Adela… Sout… Other            0.743            0.189                2
##  4 Adela… Sout… Visiti…          0.433            0.446                1
##  5 Adela… Sout… Busine…          0.453            0.140                3
##  6 Adela… Sout… Holiday          0.512            0.244                2
##  7 Adela… Sout… Other            0.584            0.374                2
##  8 Adela… Sout… Visiti…          0.481            0.228                0
##  9 Alice… Nort… Busine…          0.526            0.224                0
## 10 Alice… Nort… Holiday          0.377            0.827                3
## # … with 294 more rows, and 6 more variables: seasonal_trough_year <dbl>,
## #   spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
## #   stl_e_acf10 <dbl>
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  features(Trips, feat_stl) 
## # A tibble: 1 x 12
##   Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
##   <chr>  <chr> <chr>            <dbl>            <dbl>            <dbl>
## 1 Adela… Sout… Holiday          0.541            0.601                1
## # … with 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## #   linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

Use the features to identify a time series

We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.

tourism %>%
  features(Trips, feat_stl) %>%
  ggplot(aes(x = trend_strength, y = seasonal_strength_year, col = Purpose)) +
  geom_point() +
  facet_wrap(vars(State))

Find the year with the maximum seasonal strength.

tourism %>%
  features(Trips, feat_stl) %>%
  filter(seasonal_strength_year == max(seasonal_strength_year)) %>%
  left_join(tourism, by = c("State", "Region", "Purpose")) %>%
  ggplot(aes(x = Quarter, y = Trips)) +
  geom_line() +
  facet_grid(vars(State, Region, Purpose))

Full feature set

tourism_features <- tourism %>%
  features(Trips, feature_set(pkgs = "feasts"))
## Warning: `n_flat_spots()` is deprecated as of feasts 0.1.5.
## Please use `longest_flat_spot()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
tourism_features
## # A tibble: 304 x 51
##    Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
##    <chr>  <chr> <chr>            <dbl>            <dbl>            <dbl>
##  1 Adela… Sout… Busine…          0.451            0.380                3
##  2 Adela… Sout… Holiday          0.541            0.601                1
##  3 Adela… Sout… Other            0.743            0.189                2
##  4 Adela… Sout… Visiti…          0.433            0.446                1
##  5 Adela… Sout… Busine…          0.453            0.140                3
##  6 Adela… Sout… Holiday          0.512            0.244                2
##  7 Adela… Sout… Other            0.584            0.374                2
##  8 Adela… Sout… Visiti…          0.481            0.228                0
##  9 Alice… Nort… Busine…          0.526            0.224                0
## 10 Alice… Nort… Holiday          0.377            0.827                3
## # … with 294 more rows, and 45 more variables: seasonal_trough_year <dbl>,
## #   spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
## #   stl_e_acf10 <dbl>, acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>,
## #   diff1_acf10 <dbl>, diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>,
## #   pacf5 <dbl>, diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>,
## #   zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>,
## #   zero_end_prop <dbl>, lambda_guerrero <dbl>, kpss_stat <dbl>,
## #   kpss_pvalue <dbl>, pp_stat <dbl>, pp_pvalue <dbl>, ndiffs <int>,
## #   nsdiffs <int>, bp_stat <dbl>, bp_pvalue <dbl>, lb_stat <dbl>,
## #   lb_pvalue <dbl>, var_tiled_var <dbl>, var_tiled_mean <dbl>,
## #   shift_level_max <dbl>, shift_level_index <dbl>, shift_var_max <dbl>,
## #   shift_var_index <dbl>, shift_kl_max <dbl>, shift_kl_index <dbl>,
## #   spectral_entropy <dbl>, n_crossing_points <int>, longest_flat_spot <int>,
## #   coef_hurst <dbl>, stat_arch_lm <dbl>

For the homework we will look at the PBS data.

PBS %>% features(Cost, list(mean = mean, sd = sd))
## # A tibble: 336 x 6
##    Concession   Type        ATC1  ATC2       mean       sd
##    <chr>        <chr>       <chr> <chr>     <dbl>    <dbl>
##  1 Concessional Co-payments A     A01      67673.   14763.
##  2 Concessional Co-payments A     A02   16455044. 7498596.
##  3 Concessional Co-payments A     A03     476221.  370696.
##  4 Concessional Co-payments A     A04     463392.  154020.
##  5 Concessional Co-payments A     A05     147604.   74190.
##  6 Concessional Co-payments A     A06     417889.  163040.
##  7 Concessional Co-payments A     A07     917795.  338081.
##  8 Concessional Co-payments A     A09     343881.   89815.
##  9 Concessional Co-payments A     A10    5680010. 3180294.
## 10 Concessional Co-payments A     A11     651863.  387439.
## # … with 326 more rows