library(pacman)
p_load(tidyverse, fpp3)
Features
Time Series Features
Statistics computed on time series data are called time series features.
We will examine the time series data in the Tourism dateset.
data(tourism)
tourism
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
6 1999 Q2 Adelaide South Australia Business 200.
7 1999 Q3 Adelaide South Australia Business 169.
8 1999 Q4 Adelaide South Australia Business 134.
9 2000 Q1 Adelaide South Australia Business 154.
10 2000 Q2 Adelaide South Australia Business 169.
# … with 24,310 more rows
%>% features(Trips, quantile) tourism
# A tibble: 304 × 8
Region State Purpose `0%` `25%` `50%` `75%` `100%`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelaide South Australia Busine… 68.7 134. 153. 177. 242.
2 Adelaide South Australia Holiday 108. 135. 154. 172. 224.
3 Adelaide South Australia Other 25.9 43.9 53.8 62.5 107.
4 Adelaide South Australia Visiti… 137. 179. 206. 229. 270.
5 Adelaide Hills South Australia Busine… 0 0 1.26 3.92 28.6
6 Adelaide Hills South Australia Holiday 0 5.77 8.52 14.1 35.8
7 Adelaide Hills South Australia Other 0 0 0.908 2.09 8.95
8 Adelaide Hills South Australia Visiti… 0.778 8.91 12.2 16.8 81.1
9 Alice Springs Northern Territo… Busine… 1.01 9.13 13.3 18.5 34.1
10 Alice Springs Northern Territo… Holiday 2.81 16.9 31.5 44.8 76.5
# … with 294 more rows
Note the list can be used for many statistics.
%>% features(Trips, list(mean = mean, median = median, sd = sd, min = min, max = max)) tourism
# A tibble: 304 × 8
Region State Purpose mean median sd min max
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelaide South Australia Busine… 156. 153. 35.6 68.7 242.
2 Adelaide South Australia Holiday 157. 154. 27.1 108. 224.
3 Adelaide South Australia Other 56.6 53.8 17.3 25.9 107.
4 Adelaide South Australia Visiti… 205. 206. 32.5 137. 270.
5 Adelaide Hills South Australia Busine… 2.66 1.26 4.30 0 28.6
6 Adelaide Hills South Australia Holiday 10.5 8.52 6.37 0 35.8
7 Adelaide Hills South Australia Other 1.40 0.908 1.65 0 8.95
8 Adelaide Hills South Australia Visiti… 14.2 12.2 10.7 0.778 81.1
9 Alice Springs Northern Territory Busine… 14.6 13.3 7.20 1.01 34.1
10 Alice Springs Northern Territory Holiday 31.9 31.5 18.1 2.81 76.5
# … with 294 more rows
Lets look at the first year of the data.
%>% filter(Quarter <= yearquarter("1998 Q4")) %>%
tourism features(Trips, quantile)
# A tibble: 304 × 8
Region State Purpose `0%` `25%` `50%` `75%` `100%`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelaide South Australia Busine… 110. 123. 131. 143. 166.
2 Adelaide South Australia Holiday 130. 150. 169. 193. 224.
3 Adelaide South Australia Other 33.8 37.3 39.0 44.2 58.4
4 Adelaide South Australia Visiti… 170. 178. 207. 235. 242.
5 Adelaide Hills South Australia Busine… 0 0 0 1.85 7.42
6 Adelaide Hills South Australia Holiday 6.81 11.4 13.3 14.8 18.1
7 Adelaide Hills South Australia Other 0 0.581 1.66 3.08 4.64
8 Adelaide Hills South Australia Visiti… 2.99 3.44 5.68 7.90 8.35
9 Alice Springs Northern Territory Busine… 3.36 3.82 5.76 11.1 21.8
10 Alice Springs Northern Territory Holiday 8.15 22.5 30.9 45.1 76.5
# … with 294 more rows
ACF features
The correlations in time are all time series features.
%>% ACF() tourism
Response variable not specified, automatically selected `var = Trips`
# A tsibble: 5,776 x 5 [1Q]
# Key: Region, State, Purpose [304]
Region State Purpose lag acf
<chr> <chr> <chr> <lag> <dbl>
1 Adelaide South Australia Business 1Q 0.0333
2 Adelaide South Australia Business 2Q 0.0590
3 Adelaide South Australia Business 3Q 0.0536
4 Adelaide South Australia Business 4Q 0.201
5 Adelaide South Australia Business 5Q 0.0645
6 Adelaide South Australia Business 6Q 0.104
7 Adelaide South Australia Business 7Q -0.0556
8 Adelaide South Australia Business 8Q 0.227
9 Adelaide South Australia Business 9Q 0.0128
10 Adelaide South Australia Business 10Q -0.114
# … with 5,766 more rows
%>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
tourism ACF()
Response variable not specified, automatically selected `var = Trips`
# A tsibble: 19 x 5 [1Q]
# Key: Region, State, Purpose [1]
Region State Purpose lag acf
<chr> <chr> <chr> <lag> <dbl>
1 Adelaide South Australia Holiday 1Q 0.0456
2 Adelaide South Australia Holiday 2Q -0.143
3 Adelaide South Australia Holiday 3Q 0.0976
4 Adelaide South Australia Holiday 4Q 0.351
5 Adelaide South Australia Holiday 5Q 0.0642
6 Adelaide South Australia Holiday 6Q -0.0791
7 Adelaide South Australia Holiday 7Q 0.0228
8 Adelaide South Australia Holiday 8Q 0.364
9 Adelaide South Australia Holiday 9Q -0.0226
10 Adelaide South Australia Holiday 10Q -0.270
11 Adelaide South Australia Holiday 11Q -0.00270
12 Adelaide South Australia Holiday 12Q 0.298
13 Adelaide South Australia Holiday 13Q 0.0262
14 Adelaide South Australia Holiday 14Q -0.329
15 Adelaide South Australia Holiday 15Q -0.0412
16 Adelaide South Australia Holiday 16Q 0.161
17 Adelaide South Australia Holiday 17Q -0.0736
18 Adelaide South Australia Holiday 18Q -0.227
19 Adelaide South Australia Holiday 19Q -0.0970
%>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
tourism ACF() %>%
autoplot()
Response variable not specified, automatically selected `var = Trips`
%>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
tourism autoplot()
Plot variable not specified, automatically selected `.vars = Trips`
Compute most of the features.
%>%
tourism features(Trips, feat_stl)
# A tibble: 304 × 12
Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Adelaide Sout… Busine… 0.464 0.407 3
2 Adelaide Sout… Holiday 0.554 0.619 1
3 Adelaide Sout… Other 0.746 0.202 2
4 Adelaide Sout… Visiti… 0.435 0.452 1
5 Adelaide Hills Sout… Busine… 0.464 0.179 3
6 Adelaide Hills Sout… Holiday 0.528 0.296 2
7 Adelaide Hills Sout… Other 0.593 0.404 2
8 Adelaide Hills Sout… Visiti… 0.488 0.254 0
9 Alice Springs Nort… Busine… 0.534 0.251 0
10 Alice Springs Nort… Holiday 0.381 0.832 3
# … with 294 more rows, and 6 more variables: seasonal_trough_year <dbl>,
# spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
# stl_e_acf10 <dbl>
%>% filter(Region == "Adelaide", State == "South Australia", Purpose == "Holiday") %>%
tourism features(Trips, feat_stl)
# A tibble: 1 × 12
Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Adelaide South Austr… Holiday 0.554 0.619 1
# … with 6 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
# linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
Use the features to identify a time series
We can then use these features in plots to identify what type of series are heavily trended and what are most seasonal.
%>%
tourism features(Trips, feat_stl) %>%
ggplot(aes(x = trend_strength, y = seasonal_strength_year, col = Purpose)) +
geom_point() +
facet_wrap(vars(State))
Find the year with the maximum seasonal strength.
%>%
tourism features(Trips, feat_stl) %>%
filter(seasonal_strength_year == max(seasonal_strength_year)) %>%
left_join(tourism, by = c("State", "Region", "Purpose")) %>%
ggplot(aes(x = Quarter, y = Trips)) +
geom_line() +
facet_grid(vars(State, Region, Purpose))
Full feature set
<- tourism %>%
tourism_features features(Trips, feature_set(pkgs = "feasts"))
Warning: `n_flat_spots()` was deprecated in feasts 0.1.5.
Please use `longest_flat_spot()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
tourism_features
# A tibble: 304 × 51
Region State Purpose trend_strength seasonal_streng… seasonal_peak_y…
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Adelaide Sout… Busine… 0.464 0.407 3
2 Adelaide Sout… Holiday 0.554 0.619 1
3 Adelaide Sout… Other 0.746 0.202 2
4 Adelaide Sout… Visiti… 0.435 0.452 1
5 Adelaide Hills Sout… Busine… 0.464 0.179 3
6 Adelaide Hills Sout… Holiday 0.528 0.296 2
7 Adelaide Hills Sout… Other 0.593 0.404 2
8 Adelaide Hills Sout… Visiti… 0.488 0.254 0
9 Alice Springs Nort… Busine… 0.534 0.251 0
10 Alice Springs Nort… Holiday 0.381 0.832 3
# … with 294 more rows, and 45 more variables: seasonal_trough_year <dbl>,
# spikiness <dbl>, linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>,
# stl_e_acf10 <dbl>, acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>,
# diff1_acf10 <dbl>, diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>,
# pacf5 <dbl>, diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>,
# zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>,
# zero_end_prop <dbl>, lambda_guerrero <dbl>, kpss_stat <dbl>, …
For the homework we will look at the PBS data.
%>% features(Cost, list(mean = mean, sd = sd)) PBS
# A tibble: 336 × 6
Concession Type ATC1 ATC2 mean sd
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Concessional Co-payments A A01 67673. 14763.
2 Concessional Co-payments A A02 16455044. 7498596.
3 Concessional Co-payments A A03 476221. 370696.
4 Concessional Co-payments A A04 463392. 154020.
5 Concessional Co-payments A A05 147604. 74190.
6 Concessional Co-payments A A06 417889. 163040.
7 Concessional Co-payments A A07 917795. 338081.
8 Concessional Co-payments A A09 343881. 89815.
9 Concessional Co-payments A A10 5680010. 3180294.
10 Concessional Co-payments A A11 651863. 387439.
# … with 326 more rows