Today we are going to take a look at a number of the time series dataset, tibbles, that are used in the fpp3 book.
The first datast to look at is the beer data.
── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
✔ tibble 3.1.7 ✔ tsibble 1.1.1
✔ dplyr 1.0.9 ✔ tsibbledata 0.4.0
✔ tidyr 1.2.0 ✔ feasts 0.2.2
✔ lubridate 1.8.0 ✔ fable 0.3.1
✔ ggplot2 3.3.6
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date() masks base::date()
✖ dplyr::filter() masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag() masks stats::lag()
✖ tsibble::setdiff() masks base::setdiff()
✖ tsibble::union() masks base::union()
See the information about the tsibble package and find the description of the aus_retail dataset.
# A tsibble: 64,532 x 5 [1M]
# Key: State, Industry [152]
State Industry `Series ID` Month Turnover
<chr> <chr> <chr> <mth> <dbl>
1 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Apr 4.4
2 Australian Capital Territory Cafes, restaurant… A3349849A 1982 May 3.4
3 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jun 3.6
4 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jul 4
5 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Aug 3.6
6 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Sep 4.2
7 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Oct 4.8
8 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Nov 5.4
9 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Dec 6.9
10 Australian Capital Territory Cafes, restaurant… A3349849A 1983 Jan 3.8
# … with 64,522 more rows
[1] "tbl_ts" "tbl_df" "tbl" "data.frame"
Note that the dataset is not in the R Environment. This makes it hard to see what the dataset contains. To load the dataset we use the data() R function.
data (aus_production)
head (aus_production)
# A tsibble: 6 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1956 Q1 284 5225 189 465 3923 5
2 1956 Q2 213 5178 204 532 4436 6
3 1956 Q3 227 5297 208 561 4806 7
4 1956 Q4 308 5681 197 570 4418 6
5 1957 Q1 262 5577 187 529 4339 5
6 1957 Q2 228 5651 214 604 4811 7
Open the dataset, does it look like there is more than on time series contained in the dataset?
When the dataset is piped into autoplot, the first time series column is plotted.
aus_production %>% autoplot ()
Plot variable not specified, automatically selected `.vars = Beer`
aus_production %>% autoplot (Beer)
Why does this look different from the picture in Section 4?
aus_prod_2000 <- aus_production %>% select (Quarter, Beer) %>%
filter (Quarter >= yearquarter ("2000 Q1" ))
aus_prod_2000
# A tsibble: 42 x 2 [1Q]
Quarter Beer
<qtr> <dbl>
1 2000 Q1 421
2 2000 Q2 402
3 2000 Q3 414
4 2000 Q4 500
5 2001 Q1 451
6 2001 Q2 380
7 2001 Q3 416
8 2001 Q4 492
9 2002 Q1 428
10 2002 Q2 408
# … with 32 more rows
Now try plotting.
aus_prod_2000 %>% autoplot ()
Plot variable not specified, automatically selected `.vars = Beer`
Try this for each of the time series.
aus_prod_2000 <- aus_production %>% select (Quarter, Bricks) %>%
filter (Quarter >= yearquarter ("2000 Q1" ))
aus_prod_2000 %>% autoplot ()
Plot variable not specified, automatically selected `.vars = Bricks`
Warning: Removed 20 row(s) containing missing values (geom_path).
In Section 1.7
For Homework 1 you need to work with the following datasets from the tsibble R package.
# A tsibble: 5,032 x 8 [!]
# Key: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
# … with 5,022 more rows
# A tsibble: 67,596 x 9 [1M]
# Key: Concession, Type, ATC1, ATC2 [336]
Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
<mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1991 Jul Concessional Co-payme… A Alimenta… A01 STOMATOL… 18228 67877
2 1991 Aug Concessional Co-payme… A Alimenta… A01 STOMATOL… 15327 57011
3 1991 Sep Concessional Co-payme… A Alimenta… A01 STOMATOL… 14775 55020
4 1991 Oct Concessional Co-payme… A Alimenta… A01 STOMATOL… 15380 57222
5 1991 Nov Concessional Co-payme… A Alimenta… A01 STOMATOL… 14371 52120
6 1991 Dec Concessional Co-payme… A Alimenta… A01 STOMATOL… 15028 54299
7 1992 Jan Concessional Co-payme… A Alimenta… A01 STOMATOL… 11040 39753
8 1992 Feb Concessional Co-payme… A Alimenta… A01 STOMATOL… 15165 54405
9 1992 Mar Concessional Co-payme… A Alimenta… A01 STOMATOL… 16898 61108
10 1992 Apr Concessional Co-payme… A Alimenta… A01 STOMATOL… 18141 65356
# … with 67,586 more rows
# A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
Time Demand Temperature Date Holiday
<dttm> <dbl> <dbl> <date> <lgl>
1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
# … with 52,598 more rows
# A tsibble: 91 x 3 [1Y]
Year Hare Lynx
<dbl> <dbl> <dbl>
1 1845 19580 30090
2 1846 19600 45150
3 1847 19610 49150
4 1848 11990 39520
5 1849 28040 21230
6 1850 58000 8420
7 1851 74600 5560
8 1852 75090 5080
9 1853 88480 10170
10 1854 61280 19600
# … with 81 more rows
Plot variable not specified, automatically selected `.vars = Hare`
There is a clear pattern in the pelt time time series data. We will measure the Autocorrelation in time series data using the ACF function.
Response variable not specified, automatically selected `var = Hare`
# A tsibble: 19 x 2 [1Y]
lag acf
<lag> <dbl>
1 1Y 0.658
2 2Y 0.214
3 3Y -0.155
4 4Y -0.401
5 5Y -0.493
6 6Y -0.401
7 7Y -0.168
8 8Y 0.113
9 9Y 0.307
10 10Y 0.340
11 11Y 0.296
12 12Y 0.206
13 13Y 0.0372
14 14Y -0.153
15 15Y -0.285
16 16Y -0.295
17 17Y -0.202
18 18Y -0.0676
19 19Y 0.0956
And we can plot the ACF. Can you see the positive and negative correlations in the time series?
Response variable not specified, automatically selected `var = Hare`