--- title: 'Stat. 450 Section 1 or 2: Homework 1' output: html_notebook: default pdf_document: default word_document: default html_document: df_print: paged --- **Prof. Eric A. Suess** So how should you complete your homework for this class? - First thing to do is type all of your information about the problems you do in the text part of your R Notebook. - Second thing to do is type all of your R code into R chunks that can be run. - If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}. - Last thing is to spell check your R Notebook. Edit > Check Spelling... or hit the F7 key. # Homework 1 Read: Chapter 1, 2, 3 Download an install the current version of R and RStudio. Do 3.2.4 Exercises 1, 2, 3, 4, 5 Do 3.3.1 Exercises 1, 2, 3, 4, 6 Do 3.5.1 Exercises 1, 2, 4 \newpage # 3.2.4 Exercises ## 1. We see nothing. Well actually we see the first layer of a ggplot2 plot. ```{r message = FALSE} library(tidyverse) ggplot(data = mpg) ``` \newpage # 2. By viewing the mpg dataframe we see there are 234 rows and 11 columns. ```{r} mpg ``` \newpage # 3. The variable drv has levels: f = front-wheel drive, r = rear wheel drive, 4 = 4wd ```{r} help(mpg) # opens the help file ?mpg # another way to open the help file. str(mpg) # traditional way to look at the variables in a dataframe glimpse(mpg) # the way to look at the variables in a tibble View(mpg) # opens the data in a spreadsheet in RStudio ``` \newpage # 4. Scatterplot of y = hwy versus x = cyl. The average highway miles per gallon goes down as the number of cylinders increases. ```{r} ggplot(mpg, aes(y = hwy, x = cyl)) + geom_point() ``` \newpage # 5. This is not useful because there are many observations on each point in the plot. Plotting categorical variables in a scatterplot is not useful. It would be better to make a contingency table. ```{r} ggplot(mpg, aes(y = class, x = drv)) + geom_point() count(mpg, drv, class) ``` \newpage # 3.3.1 Exercises ## 1. If color is in the aes as a mapping it would need a variable from the dataframe to give the plot different colors. For example, putting in drv as the color. Alternatively, to change all of the points to blue, the color needs to be outside of the aes. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue")) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class)) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue") ``` \newpage ## 2. The categorical variables are the ones with under the variable names. The continuous variables are the ones with under them. ```{r} mpg ?mpg glimpse(mpg) ``` \newpage ## 3. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? Brighter colors are used for higher values of color. Bigger shapes are used for higher values of size. A continuous variable cannot be used for shape. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy, color = displ)) ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy, size = displ)) # ggplot(data = mpg) + # geom_point(mapping = aes(x = cyl, y = hwy, shape = displ)) # gives a error ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy, shape = drv)) ``` \newpage ## 4. Can use two mappings for the same variable. This is not good practice! ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy, size = drv, shape = drv)) ``` \newpage ## 6. What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? The color changes for the TRUE and FALSE values of the inequality. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = cyl, y = hwy, colour = displ < 5)) ``` \newpage # 3.5.1 Exercises ## 1. Is a continuous variable is used, each value of the variable is used. So potentially many many plots will be made. This may not be useful. Faceting should be done with a categorical variable. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(y = hwy, x = cyl)) + facet_wrap(~ displ, nrow = 2) ``` \newpage ## 2. The missing cells in the plot means there is no data available for that combination of values of the variables. Switch x and y in the second plot to see the relationship. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~ cyl) ``` ```{r} ggplot(data = mpg) + geom_point(mapping = aes(y = drv, x = cyl)) ``` \newpage ## 4. Compare to 3.3.1 Exercise 1. Faceting make is easier to see where the data is relative to the other variable used for color. ```{r} ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~ class, nrow = 2) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class)) + facet_wrap(~ class, nrow = 2) ```