Time Series data and autoplot()

Author

Prof. Eric A. Suess

Published

January 23, 2023

Today we are going to take a look at a number of the time series dataset, tibbles, that are used in the fpp3 book.

The first datast to look at is the beer data.

library(fpp3)
── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
✔ tibble      3.1.7     ✔ tsibble     1.1.1
✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
✔ tidyr       1.2.0     ✔ feasts      0.2.2
✔ lubridate   1.8.0     ✔ fable       0.3.1
✔ ggplot2     3.3.6     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()

See the information about the tsibble package and find the description of the aus_retail dataset.

aus_retail
# A tsibble: 64,532 x 5 [1M]
# Key:       State, Industry [152]
   State                        Industry           `Series ID`    Month Turnover
   <chr>                        <chr>              <chr>          <mth>    <dbl>
 1 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Apr      4.4
 2 Australian Capital Territory Cafes, restaurant… A3349849A   1982 May      3.4
 3 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jun      3.6
 4 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Jul      4  
 5 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Aug      3.6
 6 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Sep      4.2
 7 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Oct      4.8
 8 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Nov      5.4
 9 Australian Capital Territory Cafes, restaurant… A3349849A   1982 Dec      6.9
10 Australian Capital Territory Cafes, restaurant… A3349849A   1983 Jan      3.8
# … with 64,522 more rows
class(aus_retail)
[1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"

Note that the dataset is not in the R Environment. This makes it hard to see what the dataset contains. To load the dataset we use the data() R function.

data(aus_production)

head(aus_production)
# A tsibble: 6 x 7 [1Q]
  Quarter  Beer Tobacco Bricks Cement Electricity   Gas
    <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
1 1956 Q1   284    5225    189    465        3923     5
2 1956 Q2   213    5178    204    532        4436     6
3 1956 Q3   227    5297    208    561        4806     7
4 1956 Q4   308    5681    197    570        4418     6
5 1957 Q1   262    5577    187    529        4339     5
6 1957 Q2   228    5651    214    604        4811     7

Open the dataset, does it look like there is more than on time series contained in the dataset?

When the dataset is piped into autoplot, the first time series column is plotted.

aus_production %>% autoplot()
Plot variable not specified, automatically selected `.vars = Beer`

aus_production %>% autoplot(Beer)

Why does this look different from the picture in Section 4?

aus_prod_2000 <- aus_production %>% select(Quarter, Beer) %>% 
  filter(Quarter >= yearquarter("2000 Q1"))

aus_prod_2000
# A tsibble: 42 x 2 [1Q]
   Quarter  Beer
     <qtr> <dbl>
 1 2000 Q1   421
 2 2000 Q2   402
 3 2000 Q3   414
 4 2000 Q4   500
 5 2001 Q1   451
 6 2001 Q2   380
 7 2001 Q3   416
 8 2001 Q4   492
 9 2002 Q1   428
10 2002 Q2   408
# … with 32 more rows

Now try plotting.

aus_prod_2000 %>% autoplot()
Plot variable not specified, automatically selected `.vars = Beer`

Try this for each of the time series.

aus_prod_2000 <- aus_production %>% select(Quarter, Bricks) %>% 
  filter(Quarter >= yearquarter("2000 Q1"))

aus_prod_2000 %>% autoplot()
Plot variable not specified, automatically selected `.vars = Bricks`
Warning: Removed 20 row(s) containing missing values (geom_path).

In Section 1.7

For Homework 1 you need to work with the following datasets from the tsibble R package.

gafa_stock
# A tsibble: 5,032 x 8 [!]
# Key:       Symbol [4]
   Symbol Date        Open  High   Low Close Adj_Close    Volume
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
 7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
 8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
 9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
# … with 5,022 more rows
PBS
# A tsibble: 67,596 x 9 [1M]
# Key:       Concession, Type, ATC1, ATC2 [336]
      Month Concession   Type      ATC1  ATC1_desc ATC2  ATC2_desc Scripts  Cost
      <mth> <chr>        <chr>     <chr> <chr>     <chr> <chr>       <dbl> <dbl>
 1 1991 Jul Concessional Co-payme… A     Alimenta… A01   STOMATOL…   18228 67877
 2 1991 Aug Concessional Co-payme… A     Alimenta… A01   STOMATOL…   15327 57011
 3 1991 Sep Concessional Co-payme… A     Alimenta… A01   STOMATOL…   14775 55020
 4 1991 Oct Concessional Co-payme… A     Alimenta… A01   STOMATOL…   15380 57222
 5 1991 Nov Concessional Co-payme… A     Alimenta… A01   STOMATOL…   14371 52120
 6 1991 Dec Concessional Co-payme… A     Alimenta… A01   STOMATOL…   15028 54299
 7 1992 Jan Concessional Co-payme… A     Alimenta… A01   STOMATOL…   11040 39753
 8 1992 Feb Concessional Co-payme… A     Alimenta… A01   STOMATOL…   15165 54405
 9 1992 Mar Concessional Co-payme… A     Alimenta… A01   STOMATOL…   16898 61108
10 1992 Apr Concessional Co-payme… A     Alimenta… A01   STOMATOL…   18141 65356
# … with 67,586 more rows
vic_elec
# A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
   Time                Demand Temperature Date       Holiday
   <dttm>               <dbl>       <dbl> <date>     <lgl>  
 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE   
 7 2012-01-01 03:00:00  3694.        20.1 2012-01-01 TRUE   
 8 2012-01-01 03:30:00  3562.        19.6 2012-01-01 TRUE   
 9 2012-01-01 04:00:00  3433.        19.1 2012-01-01 TRUE   
10 2012-01-01 04:30:00  3359.        19.0 2012-01-01 TRUE   
# … with 52,598 more rows
pelt
# A tsibble: 91 x 3 [1Y]
    Year  Hare  Lynx
   <dbl> <dbl> <dbl>
 1  1845 19580 30090
 2  1846 19600 45150
 3  1847 19610 49150
 4  1848 11990 39520
 5  1849 28040 21230
 6  1850 58000  8420
 7  1851 74600  5560
 8  1852 75090  5080
 9  1853 88480 10170
10  1854 61280 19600
# … with 81 more rows
pelt %>% autoplot()
Plot variable not specified, automatically selected `.vars = Hare`

There is a clear pattern in the pelt time time series data. We will measure the Autocorrelation in time series data using the ACF function.

ACF(pelt)
Response variable not specified, automatically selected `var = Hare`
# A tsibble: 19 x 2 [1Y]
     lag     acf
   <lag>   <dbl>
 1    1Y  0.658 
 2    2Y  0.214 
 3    3Y -0.155 
 4    4Y -0.401 
 5    5Y -0.493 
 6    6Y -0.401 
 7    7Y -0.168 
 8    8Y  0.113 
 9    9Y  0.307 
10   10Y  0.340 
11   11Y  0.296 
12   12Y  0.206 
13   13Y  0.0372
14   14Y -0.153 
15   15Y -0.285 
16   16Y -0.295 
17   17Y -0.202 
18   18Y -0.0676
19   19Y  0.0956

And we can plot the ACF. Can you see the positive and negative correlations in the time series?

ACF(pelt) %>% autoplot()
Response variable not specified, automatically selected `var = Hare`