Features

Author

Prof. Eric A. Suess

Published

February 1, 2023

Time Series Features

Statistics computed on time series data are called time series features.

library(pacman)
p_load(tidyverse, fpp3)

We will examine the time series data in the Tourism dateset.

data(tourism)
tourism
# A tsibble: 24,320 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Quarter Region   State           Purpose  Trips
     <qtr> <chr>    <chr>           <chr>    <dbl>
 1 1998 Q1 Adelaide South Australia Business  135.
 2 1998 Q2 Adelaide South Australia Business  110.
 3 1998 Q3 Adelaide South Australia Business  166.
 4 1998 Q4 Adelaide South Australia Business  127.
 5 1999 Q1 Adelaide South Australia Business  137.
 6 1999 Q2 Adelaide South Australia Business  200.
 7 1999 Q3 Adelaide South Australia Business  169.
 8 1999 Q4 Adelaide South Australia Business  134.
 9 2000 Q1 Adelaide South Australia Business  154.
10 2000 Q2 Adelaide South Australia Business  169.
# … with 24,310 more rows
tourism %>% features(Trips, quantile)
# A tibble: 304 × 8
   Region         State             Purpose    `0%`  `25%`   `50%`  `75%` `100%`
   <chr>          <chr>             <chr>     <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
 1 Adelaide       South Australia   Busine…  68.7   134.   153.    177.   242.  
 2 Adelaide       South Australia   Holiday 108.    135.   154.    172.   224.  
 3 Adelaide       South Australia   Other    25.9    43.9   53.8    62.5  107.  
 4 Adelaide       South Australia   Visiti… 137.    179.   206.    229.   270.  
 5 Adelaide Hills South Australia   Busine…   0       0      1.26    3.92  28.6 
 6 Adelaide Hills South Australia   Holiday   0       5.77   8.52   14.1   35.8 
 7 Adelaide Hills South Australia   Other     0       0      0.908   2.09   8.95
 8 Adelaide Hills South Australia   Visiti…   0.778   8.91  12.2    16.8   81.1 
 9 Alice Springs  Northern Territo… Busine…   1.01    9.13  13.3    18.5   34.1 
10 Alice Springs  Northern Territo… Holiday   2.81   16.9   31.5    44.8   76.5 
# … with 294 more rows

Note the list can be used for many statistics.

tourism %>% features(Trips, list(mean = mean, median = median, sd = sd, min = min, max = max))
# A tibble: 304 × 8
   Region         State              Purpose   mean  median    sd     min    max
   <chr>          <chr>              <chr>    <dbl>   <dbl> <dbl>   <dbl>  <dbl>
 1 Adelaide       South Australia    Busine… 156.   153.    35.6   68.7   242.  
 2 Adelaide       South Australia    Holiday 157.   154.    27.1  108.    224.  
 3 Adelaide       South Australia    Other    56.6   53.8   17.3   25.9   107.  
 4 Adelaide       South Australia    Visiti… 205.   206.    32.5  137.    270.  
 5 Adelaide Hills South Australia    Busine…   2.66   1.26   4.30   0      28.6 
 6 Adelaide Hills South Australia    Holiday  10.5    8.52   6.37   0      35.8 
 7 Adelaide Hills South Australia    Other     1.40   0.908  1.65   0       8.95
 8 Adelaide Hills South Australia    Visiti…  14.2   12.2   10.7    0.778  81.1 
 9 Alice Springs  Northern Territory Busine…  14.6   13.3    7.20   1.01   34.1 
10 Alice Springs  Northern Territory Holiday  31.9   31.5   18.1    2.81   76.5 
# … with 294 more rows

Lets look at the first year of the data.

tourism %>% filter(Quarter <= yearquarter("1998 Q4")) %>% 
  features(Trips, quantile)
# A tibble: 304 × 8
   Region         State              Purpose   `0%`   `25%`  `50%`  `75%` `100%`
   <chr>          <chr>              <chr>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
 1 Adelaide       South Australia    Busine… 110.   123.    131.   143.   166.  
 2 Adelaide       South Australia    Holiday 130.   150.    169.   193.   224.  
 3 Adelaide       South Australia    Other    33.8   37.3    39.0   44.2   58.4 
 4 Adelaide       South Australia    Visiti… 170.   178.    207.   235.   242.  
 5 Adelaide Hills South Australia    Busine…   0      0       0      1.85   7.42
 6 Adelaide Hills South Australia    Holiday   6.81  11.4    13.3   14.8   18.1 
 7 Adelaide Hills South Australia    Other     0      0.581   1.66   3.08   4.64
 8 Adelaide Hills South Australia    Visiti…   2.99   3.44    5.68   7.90   8.35
 9 Alice Springs  Northern Territory Busine…   3.36   3.82    5.76  11.1   21.8 
10 Alice Springs  Northern Territory Holiday   8.15  22.5    30.9   45.1   76.5 
# … with 294 more rows

ACF features

The correlations in time are all time series features.

tourism %>% ACF()
Response variable not specified, automatically selected `var = Trips`
# A tsibble: 5,776 x 5 [1Q]
# Key:       Region, State, Purpose [304]
   Region   State           Purpose    lag     acf
   <chr>    <chr>           <chr>    <lag>   <dbl>
 1 Adelaide South Australia Business    1Q  0.0333
 2 Adelaide South Australia Business    2Q  0.0590
 3 Adelaide South Australia Business    3Q  0.0536
 4 Adelaide South Australia Business    4Q  0.201 
 5 Adelaide South Australia Business    5Q  0.0645
 6 Adelaide South Australia Business    6Q  0.104 
 7 Adelaide South Australia Business    7Q -0.0556
 8 Adelaide South Australia Business    8Q  0.227 
 9 Adelaide South Australia Business    9Q  0.0128
10 Adelaide South Australia Business   10Q -0.114 
# … with 5,766 more rows
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  ACF() 
Response variable not specified, automatically selected `var = Trips`
# A tsibble: 19 x 5 [1Q]
# Key:       Region, State, Purpose [1]
   Region   State           Purpose   lag      acf
   <chr>    <chr>           <chr>   <lag>    <dbl>
 1 Adelaide South Australia Holiday    1Q  0.0456 
 2 Adelaide South Australia Holiday    2Q -0.143  
 3 Adelaide South Australia Holiday    3Q  0.0976 
 4 Adelaide South Australia Holiday    4Q  0.351  
 5 Adelaide South Australia Holiday    5Q  0.0642 
 6 Adelaide South Australia Holiday    6Q -0.0791 
 7 Adelaide South Australia Holiday    7Q  0.0228 
 8 Adelaide South Australia Holiday    8Q  0.364  
 9 Adelaide South Australia Holiday    9Q -0.0226 
10 Adelaide South Australia Holiday   10Q -0.270  
11 Adelaide South Australia Holiday   11Q -0.00270
12 Adelaide South Australia Holiday   12Q  0.298  
13 Adelaide South Australia Holiday   13Q  0.0262 
14 Adelaide South Australia Holiday   14Q -0.329  
15 Adelaide South Australia Holiday   15Q -0.0412 
16 Adelaide South Australia Holiday   16Q  0.161  
17 Adelaide South Australia Holiday   17Q -0.0736 
18 Adelaide South Australia Holiday   18Q -0.227  
19 Adelaide South Australia Holiday   19Q -0.0970 
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  ACF() %>% 
  autoplot()
Response variable not specified, automatically selected `var = Trips`

tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  autoplot() 
Plot variable not specified, automatically selected `.vars = Trips`

Compute most of the features.

tourism %>%
  features(Trips, feat_stl)
# A tibble: 304 × 12
   Region         State Purpose trend_strength seasonal_streng… seasonal_peak_y…
   <chr>          <chr> <chr>            <dbl>            <dbl>            <dbl>
 1 Adelaide       Sout… Busine…          0.464            0.407                3
 2 Adelaide       Sout… Holiday          0.554            0.619                1
 3 Adelaide       Sout… Other            0.746            0.202                2
 4 Adelaide       Sout… Visiti…          0.435            0.452                1
 5 Adelaide Hills Sout… Busine…          0.464            0.179                3
 6 Adelaide Hills Sout… Holiday          0.528            0.296                2
 7 Adelaide Hills Sout… Other            0.593            0.404                2
 8 Adelaide Hills Sout… Visiti…          0.488            0.254                0
 9 Alice Springs  Nort… Busine…          0.534            0.251                0
10 Alice Springs  Nort… Holiday          0.381            0.832                3
# … with 294 more rows, and 6 more variables: seasonal_trough_year <dbl>,
#   spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
#   stl_e_acf10 <dbl>
tourism %>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>% 
  features(Trips, feat_stl) 
# A tibble: 1 × 12
  Region   State        Purpose trend_strength seasonal_streng… seasonal_peak_y…
  <chr>    <chr>        <chr>            <dbl>            <dbl>            <dbl>
1 Adelaide South Austr… Holiday          0.554            0.619                1
# … with 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
#   linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

Use the features to identify a time series

We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.

tourism %>%
  features(Trips, feat_stl) %>%
  ggplot(aes(x = trend_strength, y = seasonal_strength_year, col = Purpose)) +
  geom_point() +
  facet_wrap(vars(State))

Find the year with the maximum seasonal strength.

tourism %>%
  features(Trips, feat_stl) %>%
  filter(seasonal_strength_year == max(seasonal_strength_year)) %>%
  left_join(tourism, by = c("State", "Region", "Purpose")) %>%
  ggplot(aes(x = Quarter, y = Trips)) +
  geom_line() +
  facet_grid(vars(State, Region, Purpose))

Full feature set

tourism_features <- tourism %>%
  features(Trips, feature_set(pkgs = "feasts"))
Warning: `n_flat_spots()` was deprecated in feasts 0.1.5.
Please use `longest_flat_spot()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
tourism_features
# A tibble: 304 × 51
   Region         State Purpose trend_strength seasonal_streng… seasonal_peak_y…
   <chr>          <chr> <chr>            <dbl>            <dbl>            <dbl>
 1 Adelaide       Sout… Busine…          0.464            0.407                3
 2 Adelaide       Sout… Holiday          0.554            0.619                1
 3 Adelaide       Sout… Other            0.746            0.202                2
 4 Adelaide       Sout… Visiti…          0.435            0.452                1
 5 Adelaide Hills Sout… Busine…          0.464            0.179                3
 6 Adelaide Hills Sout… Holiday          0.528            0.296                2
 7 Adelaide Hills Sout… Other            0.593            0.404                2
 8 Adelaide Hills Sout… Visiti…          0.488            0.254                0
 9 Alice Springs  Nort… Busine…          0.534            0.251                0
10 Alice Springs  Nort… Holiday          0.381            0.832                3
# … with 294 more rows, and 45 more variables: seasonal_trough_year <dbl>,
#   spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
#   stl_e_acf10 <dbl>, acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>,
#   diff1_acf10 <dbl>, diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>,
#   pacf5 <dbl>, diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>,
#   zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>,
#   zero_end_prop <dbl>, lambda_guerrero <dbl>, kpss_stat <dbl>, …

For the homework we will look at the PBS data.

PBS %>% features(Cost, list(mean = mean, sd = sd))
# A tibble: 336 × 6
   Concession   Type        ATC1  ATC2       mean       sd
   <chr>        <chr>       <chr> <chr>     <dbl>    <dbl>
 1 Concessional Co-payments A     A01      67673.   14763.
 2 Concessional Co-payments A     A02   16455044. 7498596.
 3 Concessional Co-payments A     A03     476221.  370696.
 4 Concessional Co-payments A     A04     463392.  154020.
 5 Concessional Co-payments A     A05     147604.   74190.
 6 Concessional Co-payments A     A06     417889.  163040.
 7 Concessional Co-payments A     A07     917795.  338081.
 8 Concessional Co-payments A     A09     343881.   89815.
 9 Concessional Co-payments A     A10    5680010. 3180294.
10 Concessional Co-payments A     A11     651863.  387439.
# … with 326 more rows