--- title: 'Stat. 450 Section 1 or 2: Homework 5' output: word_document: default html_notebook: default pdf_document: default html_document: df_print: paged --- **Prof. Eric A. Suess** So how should you complete your homework for this class? - First thing to do is type all of your information about the problems you do in the text part of your R Notebook. - Second thing to do is type all of your R code into R chunks that can be run. - If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}. - Last thing is to spell check your R Notebook. Edit > Check Spelling... or hit the F7 key. Homework 5: Read: Chapter 7 Do 7.3.4 Exercises 1, 2, 3, 4 Do 7.4.1 Exercises 1, 2 Do 7.5.1.1 Exercises 2, 3, 4, 5, 6 ```{r message=FALSE} library(tidyverse) ``` # 7.3.4 ## 1. All are skewed to the right. The distributions of x and y are very similar. The distributions of z looks to be less spread out. All look to be bimodal. I think x and y are the length and width, and z is the depth. ```{r} diamonds diamonds %>% select(x,y,z) %>% ggplot(aes(x = x )) + geom_histogram() + scale_x_continuous(limits=c(0, 10)) + scale_y_continuous(limits=c(0, 15000)) diamonds %>% select(x,y,z) %>% ggplot(aes(x = y )) + geom_histogram() + scale_x_continuous(limits=c(0, 10)) + scale_y_continuous(limits=c(0, 15000)) diamonds %>% select(x,y,z) %>% ggplot(aes(x = z )) + geom_histogram() + scale_x_continuous(limits=c(0, 10)) + scale_y_continuous(limits=c(0, 15000)) ``` ## 2. There are not prices around $1500. The mode of the distributions is around $750. At the lowest levels there are spikes in the prices where the diamonds are actually priced. ```{r} diamonds %>% select(price) %>% ggplot(aes(x = price )) + geom_histogram() diamonds %>% select( price ) %>% filter(price < 2500) %>% ggplot(aes(x = price )) + geom_histogram() diamonds %>% select( price ) %>% filter(price < 2500) %>% ggplot(aes(x = price )) + geom_histogram(binwidth = 10, center = 0 ) diamonds %>% select( price ) %>% filter(price < 900, price > 800) %>% ggplot(aes(x = price )) + geom_histogram(binwidth = 01, center = 0 ) ``` ## 3. There are 23 diamonds that are .99 carats and there are 1558 diamonds that are 1 carat. Rounding up is worth more money. ```{r} diamonds %>% select(carat) %>% count(carat == 0.99) diamonds %>% select(carat) %>% count(carat == 1) diamonds %>% filter(carat >= 0.9, carat <= 1.1) %>% count(carat) diamonds %>% filter(carat >= 0.9, carat <= 1.1) %>% count(carat) %>% ggplot(aes(x= carat, y = n)) + geom_col() ``` ## 4. The *cood_cartesian* function zooms in on the original histogram. The *xlim* and *ylim* functions limits the range of the data before counting. So the histogram is made for a subset of the data. ```{r} diamonds %>% select(price) %>% ggplot(aes(x = price )) + geom_histogram() diamonds %>% select(price) %>% ggplot(aes(x = price )) + geom_histogram() + coord_cartesian(xlim = c(0, 5000), ylim = c(0, 10000)) diamonds %>% select(price) %>% ggplot(aes(x = price )) + geom_histogram(binwidth = 100) + coord_cartesian(xlim = c(0, 5000), ylim = c(0, 10000)) diamonds %>% select(price) %>% ggplot(aes(x = price )) + geom_histogram(binwidth = 100) + xlim(0, 5000) + ylim(0, 10000) ``` # 7.4.1 ## 1. For histograms missing data is removed. For bargraphs the NAs are considered another category. ## 2. The option *na.rm* in the *mean* and *sum* functions remove the NAs before the values of the functions are computed. NAs are not numeric values so they cannot be included in a sum calculation. # 7.5.1.1 ## 1. The cancelled flights tend to occur later in the day, but have a wider range of scheduled departure hour. ```{r} library(nycflights13) flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes(x = cancelled, y = sched_hour)) + geom_boxplot() + coord_flip() ``` ## 2. The most important variable is carat. ```{r} diamonds %>% select (price, carat, depth, table, x, y , z) %>% cor() diamonds %>% select (price, carat, depth, table, x, y , z) %>% ggplot(aes(x = carat, y = price)) + geom_point() diamonds %>% ggplot(aes(x = carat, y = price)) + geom_boxplot(aes(group = cut_width(carat, 0.1))) ``` Examining the caregorical variables. Weak positive relationship of price with color. Weak neagative relationship of price with clarity and cut. ```{r} diamonds %>% ggplot( aes(x = color, y = price)) + geom_boxplot() diamonds %>% ggplot(aes(x = clarity, y = price)) + geom_boxplot() ggplot(diamonds, aes(x = cut, y = carat)) + geom_boxplot() ``` ## 3. Looks the same, but x and y need to be switched for the boxploth() ```{r} flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes(x = cancelled, y = sched_hour)) + geom_boxplot() + coord_flip() library(ggstance) flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes(y = cancelled, x = sched_hour)) + geom_boxploth() ``` ## 4. The boxes in the lvplot correspond to percentiles, every 10%. Outliers are in the direction of the thinner percentiles. ```{r} library(lvplot) diamonds %>% select(price, cut) %>% ggplot(aes(x = cut, y = price)) + geom_lv() + coord_flip() flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes(x = cancelled, y = sched_hour)) + geom_lv() + coord_flip() ``` ## 5. The facted histograms are printed in the reverse order of the violin plots. I would be good to have the vertical scales the same. ```{r} flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes(x = cancelled, y = sched_hour)) + geom_violin() + coord_flip() flights %>% mutate( cancelled = is.na(dep_time), sched_hour = sched_dep_time %/% 100, sched_min = sched_dep_time %% 100 ) %>% ggplot(aes( x = sched_hour )) + geom_histogram() + facet_wrap(~ cancelled, nrow = 2) ``` ## 6. method | description -------|------------ default | jitters the point horizontally tukey | jitters more tukeyDense | jitters but less than tukey frowney | jitters downward smiley | jitters upward ```{r} library(ggbeeswarm) mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_beeswarm() mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_quasirandom() mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_quasirandom(method = "tukey") mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_quasirandom(method = "tukeyDense") mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_quasirandom(method = "frowney") mpg %>% ggplot(aes(x = reorder(class, hwy, FUN = median),y = hwy) ) + geom_quasirandom(method = "smiley") ```