--- title: 'Stat. 450 Section 1 or 2: Homework 9' output: html_document: df_print: paged html_notebook: default pdf_document: default word_document: default --- **Prof. Eric A. Suess** So how should you complete your homework for this class? - First thing to do is type all of your information about the problems you do in the text part of your R Notebook. - Second thing to do is type all of your R code into R chunks that can be run. - If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}. - Last thing is to spell check your R Notebook. Edit > Check Spelling... or hit the F7 key. Upload one file to Blackboard. Homework 9: Read: Chapter 13 Exercises: Do 13.2.1 Exercises 1, 3 Do 13.3.1 Exercise 1 Do 13.4.6 Exercises 1, 2, 3 # 13.2.1 ## 1. Imagine you wanted to draw (approximately) the route each plane flies from its origin to its destination. What variables would you need? What tables would you need to combine? **Answer:** Need flights and airports. From flights get origin and dest. From airports get lat and long. ```{r} library(tidyverse) library(nycflights13) flights airports flights <- flights %>% left_join(airports, c("origin" = "faa")) flights <- flights %>% left_join(airports, c("dest" = "faa")) flights ``` ## 3. weather only contains information for the origin (NYC) airports. If it contained weather records for all airports in the USA, what additional relation would it define with flights? **Answer:** If all airports were included then the weather at the destination would be available also. Note that the year, month, day, hour would be used for the destination location's weather. ```{r} weather flights ``` # 13.3.1. ## 1. Add a surrogate key to flights. ```{r} flights flights %>% mutate(index = row_number()) ``` # 13.4.6 ## 1. Compute the average delay by destination, then join on the airports data frame so you can show the spatial distribution of delays. Here's an easy way to draw a map of the United States: ```{r} flights delays <- flights %>% group_by(dest) %>% summarise(delay_ave = mean(arr_delay, na.rm = TRUE)) delays airports delays <- delays %>% inner_join(airports, by = c("dest" = "faa")) delays delays %>% ggplot(aes(lon, lat, color = delay_ave)) + borders("state") + geom_point() + coord_quickmap() ``` ## 2. Add the location of the origin and destination (i.e. the lat and lon) to flights. ```{r} airports airports_loc <- airports %>% select(faa, lat, lon) flights %>% select(year:day, hour, origin, dest) %>% left_join( airports_loc, by = c("origin" = "faa") ) %>% left_join( airports_loc, by = c("dest" = "faa") ) ``` ## 3. Is there a relationship between the age of a plane and its delays? ```{r} plane_ages <- planes %>% mutate(age = 2013 - year) %>% select(tailnum, age) flights %>% inner_join(plane_ages, by = "tailnum") %>% group_by(age) %>% filter(!is.na(dep_delay)) %>% summarise(delay = mean(dep_delay)) %>% ggplot(aes(x = age, y = delay)) + geom_point() + geom_line() ```