Prof. Eric A. Suess
So how should you complete your homework for this class?
Homework 4:
Read: Chapter 5
Do 5.4.1 Exercise 4
Do 5.5.2 Exericise 1, 4
Do 5.6.7 Exercise 1
library(tidyverse)
Yes. The contains() helper function picks out all of the variables in the dataset that contains the word TIME. The function is also not case sensitive.
library(nycflights13)
flights
flights %>% select(contains("TIME"))
The select() helpers are not case sensitive, when R is case sensitive.
To change the default. Don’t know why it does not show the columns like above.
flights %>% select(contains("TIME", ignore.case = FALSE))
Minutes since midnight.
flights
Covert dep_time and sechedule_dep_time to minutes since midnight.
dep_time %/% 100 * 60 This give the minutes since midnight.
dep_time %% 100 This gives the reminder in minutes.
flights %>% mutate(dep_time_mins = ( ( (dep_time %/% 100) * 60 ) + (dep_time %% 100)),
sched_dep_time_mins = ( ( (sched_dep_time %/% 100) * 60 ) + (sched_dep_time %% 100)) )
Ten most delayed flights. There are no ties in these 10.
flights %>% arrange(desc(dep_delay)) %>%
head(10)
Brainstorm at least 5 different ways to assess the typical delay characteristics of a group of flights.
Which is more important: arrival delay or departure delay?
Arrival delay is more important.
What proportion of flights are on time or arrive early? Approximtely 60% of all flights are on time.
flights %>% summarize(flt_ontime = mean(arr_delay <= 0, na.rm = TRUE) )
Which arrier/airline has the best ontime rate?
flights %>% group_by(carrier) %>%
summarize(flt_ontime = mean(arr_delay <= 0, na.rm = TRUE) ) %>%
arrange(flt_ontime)
What proportion of flight are 10 mins or more late?
flights %>% summarize(flt_late10 = mean(arr_delay >= 10, na.rm = TRUE) )
flights %>% group_by(carrier) %>%
summarize(flt_late10 = mean(arr_delay >= 10, na.rm = TRUE) ) %>%
arrange(flt_late10)
What proportion of flight are 30 mins or more late?
flights %>% summarize(flt_late30 = mean(arr_delay >= 30, na.rm = TRUE) )
flights %>% group_by(carrier) %>%
summarize(flt_late30 = mean(arr_delay >= 30, na.rm = TRUE) ) %>%
arrange(flt_late30)