--- title: "ExploratoryDataAnalysis2" output: word_document: toc: yes pdf_document: toc: yes html_document: theme: cerulean toc: yes toc_float: yes --- # Comparing Two Variables. {.sidebar} Today we will continue to discuss Exploratory Data Analysis (EDA). 1. Two categorical variables. 2. One categorical variable and one numeric variable. 3. Two numeric variables. ```{r message=FALSE} library(tidyverse) ``` ## Two categorical variables. {.sidebar} ```{r} diamonds %>% ggplot(aes(x = cut, y = color)) + geom_count() ``` ```{r} diamonds %>% count(color, cut) %>% ggplot(mapping = aes(x = cut, y = color)) + geom_tile(mapping = aes(fill = n)) ``` ```{r} diamonds %>% count(color, cut) ``` ```{r} diamonds %>% group_by(color, cut) %>% summarise(n=n()) ``` ## Contingency table. {.sidebar} ```{r} diamonds %>% group_by(color, cut) %>% summarise(n=n()) %>% spread(cut, n) ``` Using the new *pivot_wider()* function, that replaces the *spread()*. You will need to update the **tidyr** package to version 1.0. The new function has a name that makes more sense and is more memorable. ```{r} diamonds %>% group_by(color, cut) %>% summarise(n=n()) %>% pivot_wider( names_from = cut, values_from = n ) ``` Export the data to an Excel file and try making this Pivot Table. ```{r} write.csv(diamonds, file="~/diamonds.csv") ``` ## One categorical variables and one numeric. {.sidebar} ```{r} ggplot(data = diamonds, mapping = aes(x = price)) + geom_freqpoly(mapping = aes(colour = cut), binwidth = 500) ``` ```{r} ggplot(data = diamonds, mapping = aes(x = price, y = ..density..)) + geom_freqpoly(mapping = aes(colour = cut), binwidth = 500) ``` ```{r} ggplot(data = diamonds, mapping = aes(x = cut, y = price)) + geom_boxplot() ``` ## Two numeric variables. {.sidebar} ```{r} ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price)) ``` ```{r} ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price), alpha = 1 / 100) ``` ```{r} smaller <- diamonds %>% filter(carat < 3) ggplot(data = smaller) + geom_bin2d(mapping = aes(x = carat, y = price)) ``` ```{r} library(hexbin) ggplot(data = smaller) + geom_hex(mapping = aes(x = carat, y = price)) ```