library(mdsr)
library(tidyverse)
Data Wrangling R
Some of the code from Chapter 4 and 5.
In this chapter dplyr is introduced. We will be using dplyr all year.
The main idea of data wrangling with dplyr are the 5 verbs.
select() # take a subset of columns
filter() # take a subset of rows
mutate() # add or modify existing columns
arrange() # sort the rows
summarize() # aggregate the data across rows
The dplyr package is part of the tidyverse. We will install and load the tidyverse.
Star Wars dataset
data("starwars")
glimpse(starwars)
Rows: 87
Columns: 14
$ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
$ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
$ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
$ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
$ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
$ sex <chr> "male", "none", "none", "male", "female", "male", "female",…
$ gender <chr> "masculine", "masculine", "masculine", "masculine", "femini…
$ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
$ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
$ films <list> <"The Empire Strikes Back", "Revenge of the Sith", "Return…
$ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
$ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…
select()
%>% select(name, species) starwars
# A tibble: 87 × 2
name species
<chr> <chr>
1 Luke Skywalker Human
2 C-3PO Droid
3 R2-D2 Droid
4 Darth Vader Human
5 Leia Organa Human
6 Owen Lars Human
7 Beru Whitesun lars Human
8 R5-D4 Droid
9 Biggs Darklighter Human
10 Obi-Wan Kenobi Human
# ℹ 77 more rows