---
title: 'Stat. 450 Section 1 or 2: Homework 4'
output:
  word_document: default
  pdf_document: default
  html_notebook: default
  html_document:
    df_print: paged
---

**Prof. Eric A. Suess**

So how should you complete your homework for this class?

- First thing to do is type all of your information about the problems you do in the text part of your R Notebook.
- Second thing to do is type all of your R code into R chunks that can be run.
- If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}.
- Last thing is to spell check your R Notebook.  Edit > Check Spelling... or hit the F7 key.

Homework 4:

     Read: Chapter 5
     Do 5.4.1 Exercise 4
     Do 5.5.2 Exericise 1, 4
     Do 5.6.7 Exercise 1


```{r message=FALSE}
library(tidyverse)
```

# 5.4.1

## 4.

Yes.  The contains() helper function picks out all of the variables in the dataset that contains the word TIME.  The function is also not case sensitive.

```{r}
library(nycflights13)

flights
```


```{r}
flights %>% select(contains("TIME"))
```

The select() helpers are not case sensitive, when R is case sensitive.

To change the default.  Don't know why it does not show the columns like above.

```{r}
flights %>% select(contains("TIME", ignore.case = FALSE))
```

\newpage

# 5.5.2

## 1.

Minutes since midnight.


```{r}
flights 
```

Covert dep_time and sechedule_dep_time to minutes since midnight.

dep_time %/% 100  * 60  This give the minutes since midnight.

dep_time %% 100    This gives the reminder in minutes.

```{r}
flights %>% mutate(dep_time_mins = ( ( (dep_time %/% 100) * 60 ) + (dep_time %% 100)),
                   sched_dep_time_mins = ( ( (sched_dep_time %/% 100) * 60 ) + (sched_dep_time %% 100))  )
```

\newpage

## 4.

Ten most delayed flights.  There are no ties in these 10.

```{r}
flights %>% arrange(desc(dep_delay)) %>%
  head(10)
```

\newpage

# 5.6.7

## 1.

Brainstorm at least 5 different ways to assess the typical delay characteristics of a group of flights.

1. median and mean of dep_delay time in minutes.
2. sd of dep_delay time in minites
3. median and mean of arr_delay time in minutes.
4. sd of dep_delay time in minutes
5. is the distribution of arr_delay symmetric or skewed?  Same questions for dep_delay?


Which is more important: arrival delay or departure delay?  

**Arrival delay** is more important.

```{r}

flights %>% select(dep_delay, arr_delay)  %>% 
  summarize( n=n(), dep_delay_median = median(dep_delay, na.rm = TRUE),
                    dep_delay_mean = mean(dep_delay, na.rm = TRUE), 
                    dep_delay_sd = sd(dep_delay, na.rm = TRUE),
                    arr_delay_median = median(arr_delay, na.rm = TRUE),
                    arr_delay_mean = mean(arr_delay, na.rm = TRUE),
                    arr_delay_sd = sd(arr_delay, na.rm = TRUE) )

```


What proportion of flights are on time or arrive early?  Approximtely 60% of all flights are on time.

```{r}
flights %>% summarize(flt_ontime = mean(arr_delay <= 0, na.rm = TRUE) )

```

Which arrier/airline has the best ontime rate?

```{r}
flights %>% group_by(carrier) %>%
  summarize(flt_ontime = mean(arr_delay <= 0, na.rm = TRUE) ) %>%
  arrange(flt_ontime)

```

What proportion of flight are 10 mins or more late?

```{r}
flights %>% summarize(flt_late10 = mean(arr_delay >= 10, na.rm = TRUE) )

```

```{r}
flights %>% group_by(carrier) %>%
  summarize(flt_late10 = mean(arr_delay >= 10, na.rm = TRUE) ) %>%
    arrange(flt_late10)

```


What proportion of flight are 30 mins or more late?

```{r}
flights %>% summarize(flt_late30 = mean(arr_delay >= 30, na.rm = TRUE) )
```

```{r}
flights %>% group_by(carrier) %>%
  summarize(flt_late30 = mean(arr_delay >= 30, na.rm = TRUE) ) %>%
  arrange(flt_late30)
```