Prof. Eric A. Suess

So how should you complete your homework for this class?

Upload one file to Blackboard.

Homework 9:

 Read: Chapter 13

 Exercises:

 Do 13.2.1 Exercises 1, 3
 Do 13.3.1 Exercise 1
 Do 13.4.6 Exercises 1, 2, 3

13.2.1

1. Imagine you wanted to draw (approximately) the route each plane flies from its origin to its destination. What variables would you need? What tables would you need to combine?

Answer: Need flights and airports. From flights get origin and dest. From airports get lat and long.

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(nycflights13)

flights
airports
flights <- flights %>% left_join(airports, c("origin" = "faa"))

flights <- flights %>% left_join(airports, c("dest" = "faa"))

flights

3. weather only contains information for the origin (NYC) airports. If it contained weather records for all airports in the USA, what additional relation would it define with flights?

Answer: If all airports were included then the weather at the destination would be available also. Note that the year, month, day, hour would be used for the destination location’s weather.

weather
flights

13.3.1.

1. Add a surrogate key to flights.

flights
flights %>% mutate(index = row_number())

13.4.6

1. Compute the average delay by destination, then join on the airports data frame so you can show the spatial distribution of delays. Here’s an easy way to draw a map of the United States:

flights
delays <- flights %>% group_by(dest) %>%
  summarise(delay_ave = mean(arr_delay, na.rm = TRUE))
delays
airports
delays <- delays %>% inner_join(airports, by = c("dest" = "faa"))
delays
delays  %>%
  ggplot(aes(lon, lat, color = delay_ave)) +
    borders("state") +
    geom_point() +
    coord_quickmap()
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map

2. Add the location of the origin and destination (i.e. the lat and lon) to flights.

airports
airports_loc <- airports %>%
  select(faa, lat, lon)

flights %>%
  select(year:day, hour, origin, dest) %>%
  left_join(
    airports_loc,
    by = c("origin" = "faa")
  ) %>%
  left_join(
    airports_loc,
    by = c("dest" = "faa")
  )

3. Is there a relationship between the age of a plane and its delays?

plane_ages <-
  planes %>%
  mutate(age = 2013 - year) %>%
  select(tailnum, age)

flights %>%
  inner_join(plane_ages, by = "tailnum") %>%
  group_by(age) %>%
  filter(!is.na(dep_delay)) %>%
  summarise(delay = mean(dep_delay)) %>%
  ggplot(aes(x = age, y = delay)) +
  geom_point() +
  geom_line()
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_path).