--- title: "Data Wrangling" author: "Prof. Eric A. Suess" format: revealjs --- ## Data Wrangling Today we will get started with Data Wrangling. Data Wrangling is the process of tidying into usable forms. The R package that will be using from the [tidyverse](https://www.tidyverse.org/) is the [dplyr](https://dplyr.tidyverse.org/) package. ## The grammar of data wrangling The 5 verbs of data wrangling **select()** # take a subset of columns **filter()** # take a subset of rows **mutate()** # add or modify existing columns **arrange()** # sort the rows **summarize()** # aggregate the data across rows ## RStudio Cheatsheet for dplyr The RStudio [dplyr cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) is very useful. ## Star Wars examples ```{r echo=TRUE, message=FALSE} library(tidyverse) data("starwars") glimpse(starwars) ``` ## Star Wars ```{r echo=TRUE} starwars %>% select(name, species) ``` ## Star Wars ```{r echo=TRUE} starwars %>% filter(species == "Droid") ``` ## Star Wars ```{r echo=TRUE} starwars %>% select(name, ends_with("color")) ``` ## Star Wars ```{r echo=TRUE} starwars %>% mutate(name, bmi = mass / ((height / 100) ^ 2)) %>% select(name:mass, bmi) ``` ## Star Wars ```{r echo=TRUE} starwars %>% arrange(desc(mass)) ``` ## Star Wars ```{r echo=TRUE} starwars %>% group_by(species) %>% summarise( n = n(), mass = mean(mass, na.rm = TRUE) ) %>% filter(n > 1) ``` ## Presidential examples Try the code from the book in Section 4.1 ```{r echo=TRUE} presidential ```