Some of the code from Chapter 4, Section 1.
In this chapter dplyr is introduced. We will be using dplyr all year.
The main idea of data wrangling with dplyr are the 5 verbs.
select() # take a subset of columns
filter() # take a subset of rows
mutate() # add or modify existing columns
arrange() # sort the rows
summarize() # aggregate the data across rows
The dplyr package is part of the tidyverse. We will install and load the tidyverse.
library(mdsr)
library(tidyverse)
Star Wars dataset
data("starwars")
glimpse(starwars)
Observations: 87
Variables: 13
$ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia ...
$ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180,...
$ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 7...
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown",...
$ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light",...
$ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue...
$ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0...
$ gender <chr> "male", NA, NA, "male", "female", "male", "female", NA, "...
$ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ...
$ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Hu...
$ films <list> [<"Revenge of the Sith", "Return of the Jedi", "The Empi...
$ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "...
$ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x...
select()
starwars %>% select(name, species)
filter()
starwars %>%
filter(species == "Droid")
select()
starwars %>%
select(name, ends_with("color"))
mutate()
starwars %>%
mutate(name, bmi = mass / ((height / 100) ^ 2)) %>%
select(name:mass, bmi)
arrange()
starwars %>%
arrange(desc(mass))
summarize()
starwars %>%
group_by(species) %>%
summarise(
n = n(),
mass = mean(mass, na.rm = TRUE)
) %>%
filter(n > 1)
Questions
Develop the R code to answer the following questions.
- How many films are in the dataset?
- Are there more Droids or humans in the Star Wars movies?
- Which of the Star Wars movies was Luke Skywalker in?
- Pose a question and answer it by wrangling the starwars dataset.
Presidential examples
Try out the code in Chapter 4 Section 1 using the presidential data set.
presidential
Star Wars API and R package
More Star Wars stuff you might find interesting.
- Check out the Star Wars website.
- Check out the Star Wars API sawpi.
- And check out the R package rwars.
rwars package
This is a package that connects to the sawpi to pull data from the API.
If the package does not install from CRAN you can isntall it from github.
library(devtools)
install_github("ironholds/rwars")
library(rwars)
planet_schema <- get_planet_schema()
names(planet_schema)
[1] "required" "description" "title" "$schema" "properties" "type"
rwars package
Get an individual starship - an X-wing.
Hopefully it won’t time out and will actually bring the data back.
x_wing <- get_starship(12)
x_wing
$name
[1] "X-wing"
$model
[1] "T-65 X-wing"
$manufacturer
[1] "Incom Corporation"
$cost_in_credits
[1] "149999"
$length
[1] "12.5"
$max_atmosphering_speed
[1] "1050"
$crew
[1] "1"
$passengers
[1] "0"
$cargo_capacity
[1] "110"
$consumables
[1] "1 week"
$hyperdrive_rating
[1] "1.0"
$MGLT
[1] "100"
$starship_class
[1] "Starfighter"
$pilots
$pilots[[1]]
[1] "https://swapi.co/api/people/1/"
$pilots[[2]]
[1] "https://swapi.co/api/people/9/"
$pilots[[3]]
[1] "https://swapi.co/api/people/18/"
$pilots[[4]]
[1] "https://swapi.co/api/people/19/"
$films
$films[[1]]
[1] "https://swapi.co/api/films/2/"
$films[[2]]
[1] "https://swapi.co/api/films/3/"
$films[[3]]
[1] "https://swapi.co/api/films/1/"
$created
[1] "2014-12-12T11:19:05.340000Z"
$edited
[1] "2014-12-22T17:35:44.491233Z"
$url
[1] "https://swapi.co/api/starships/12/"
LS0tCnRpdGxlOiAiRGF0YSBXcmFuZ2xpbmcgUiIKYXV0aG9yOiAiUHJvZi4gRXJpYyBBLiBTdWVzcyIKb3V0cHV0OgogIHdvcmRfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OgogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAotLS0KClNvbWUgb2YgdGhlIGNvZGUgZnJvbSBDaGFwdGVyIDQsIFNlY3Rpb24gMS4KCkluIHRoaXMgY2hhcHRlciBkcGx5ciBpcyBpbnRyb2R1Y2VkLiAgV2Ugd2lsbCBiZSB1c2luZyBkcGx5ciBhbGwgeWVhci4KClRoZSBtYWluIGlkZWEgb2YgZGF0YSB3cmFuZ2xpbmcgd2l0aCBkcGx5ciBhcmUgdGhlIDUgdmVyYnMuIAoKKipzZWxlY3QoKSoqICAjIHRha2UgYSBzdWJzZXQgb2YgY29sdW1ucwoKKipmaWx0ZXIoKSoqICAjIHRha2UgYSBzdWJzZXQgb2Ygcm93cwoKKiptdXRhdGUoKSoqICAjIGFkZCBvciBtb2RpZnkgZXhpc3RpbmcgY29sdW1ucwoKKiphcnJhbmdlKCkqKiAgIyBzb3J0IHRoZSByb3dzCgoqKnN1bW1hcml6ZSgpKiogICMgYWdncmVnYXRlIHRoZSBkYXRhIGFjcm9zcyByb3dzCgpUaGUgZHBseXIgcGFja2FnZSBpcyBwYXJ0IG9mIHRoZSB0aWR5dmVyc2UuICBXZSB3aWxsIGluc3RhbGwgYW5kIGxvYWQgdGhlIHRpZHl2ZXJzZS4KCmBgYHtyIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkobWRzcikKbGlicmFyeSh0aWR5dmVyc2UpCmBgYAoKIyBTdGFyIFdhcnMgZGF0YXNldAoKYGBge3J9CmRhdGEoInN0YXJ3YXJzIikKZ2xpbXBzZShzdGFyd2FycykKYGBgCgojIHNlbGVjdCgpCgpgYGB7cn0Kc3RhcndhcnMgJT4lIHNlbGVjdChuYW1lLCBzcGVjaWVzKQpgYGAKCiMgZmlsdGVyKCkKCmBgYHtyfQpzdGFyd2FycyAlPiUgCiAgZmlsdGVyKHNwZWNpZXMgPT0gIkRyb2lkIikKYGBgCgojIHNlbGVjdCgpCgpgYGB7cn0Kc3RhcndhcnMgJT4lIAogIHNlbGVjdChuYW1lLCBlbmRzX3dpdGgoImNvbG9yIikpCmBgYAoKIyBtdXRhdGUoKQoKYGBge3J9CnN0YXJ3YXJzICU+JSAKICBtdXRhdGUobmFtZSwgYm1pID0gbWFzcyAvICgoaGVpZ2h0IC8gMTAwKSAgXiAyKSkgJT4lCiAgc2VsZWN0KG5hbWU6bWFzcywgYm1pKQpgYGAKCiMgYXJyYW5nZSgpCgpgYGB7cn0Kc3RhcndhcnMgJT4lIAogIGFycmFuZ2UoZGVzYyhtYXNzKSkKYGBgCgojIHN1bW1hcml6ZSgpCgpgYGB7cn0Kc3RhcndhcnMgJT4lCiAgZ3JvdXBfYnkoc3BlY2llcykgJT4lCiAgc3VtbWFyaXNlKAogICAgbiA9IG4oKSwKICAgIG1hc3MgPSBtZWFuKG1hc3MsIG5hLnJtID0gVFJVRSkKICApICU+JQogIGZpbHRlcihuID4gMSkKYGBgCgojIFF1ZXN0aW9ucwoKRGV2ZWxvcCB0aGUgUiBjb2RlIHRvIGFuc3dlciB0aGUgZm9sbG93aW5nIHF1ZXN0aW9ucy4KCjEuIEhvdyBtYW55IGZpbG1zIGFyZSBpbiB0aGUgZGF0YXNldD8KMi4gQXJlIHRoZXJlIG1vcmUgRHJvaWRzIG9yIGh1bWFucyBpbiB0aGUgU3RhciBXYXJzIG1vdmllcz8KMy4gV2hpY2ggb2YgdGhlIFN0YXIgV2FycyBtb3ZpZXMgd2FzIEx1a2UgU2t5d2Fsa2VyIGluPwo0LiBQb3NlIGEgcXVlc3Rpb24gYW5kIGFuc3dlciBpdCBieSB3cmFuZ2xpbmcgdGhlIHN0YXJ3YXJzIGRhdGFzZXQuCgojIFByZXNpZGVudGlhbCBleGFtcGxlcwoKVHJ5IG91dCB0aGUgY29kZSBpbiBDaGFwdGVyIDQgU2VjdGlvbiAxIHVzaW5nIHRoZSBwcmVzaWRlbnRpYWwgZGF0YSBzZXQuCgpgYGB7cn0KcHJlc2lkZW50aWFsCmBgYAoKIyMgU3RhciBXYXJzIEFQSSBhbmQgUiBwYWNrYWdlCgpNb3JlIFN0YXIgV2FycyBzdHVmZiB5b3UgbWlnaHQgZmluZCBpbnRlcmVzdGluZy4KCi0gQ2hlY2sgb3V0IHRoZSBbU3RhciBXYXJzXShodHRwczovL3d3dy5zdGFyd2Fycy5jb20vKSB3ZWJzaXRlLiAgCi0gQ2hlY2sgb3V0IHRoZSBTdGFyIFdhcnMgQVBJIFtzYXdwaV0oaHR0cHM6Ly9zd2FwaS5jby8pLgotIEFuZCBjaGVjayBvdXQgdGhlIFIgcGFja2FnZSBbcndhcnNdKGh0dHBzOi8vZ2l0aHViLmNvbS9Jcm9uaG9sZHMvcndhcnMpLgoKIyMgcndhcnMgcGFja2FnZQoKVGhpcyBpcyBhIHBhY2thZ2UgdGhhdCBjb25uZWN0cyB0byB0aGUgW3Nhd3BpXShodHRwczovL3N3YXBpLmNvLykgdG8gcHVsbCBkYXRhIGZyb20gdGhlIEFQSS4KCklmIHRoZSBwYWNrYWdlIGRvZXMgbm90IGluc3RhbGwgZnJvbSBDUkFOIHlvdSBjYW4gaXNudGFsbCBpdCBmcm9tIGdpdGh1Yi4KCiAgICBsaWJyYXJ5KGRldnRvb2xzKQogICAgaW5zdGFsbF9naXRodWIoImlyb25ob2xkcy9yd2FycyIpCiAgICAKICAgIAoKYGBge3IgZWNobz1UUlVFfQpsaWJyYXJ5KHJ3YXJzKQoKcGxhbmV0X3NjaGVtYSA8LSBnZXRfcGxhbmV0X3NjaGVtYSgpCm5hbWVzKHBsYW5ldF9zY2hlbWEpCgpgYGAKCiMjIHJ3YXJzIHBhY2thZ2UKCkdldCBhbiBpbmRpdmlkdWFsIHN0YXJzaGlwIC0gYW4gWC13aW5nLgoKSG9wZWZ1bGx5IGl0IHdvbid0IHRpbWUgb3V0IGFuZCB3aWxsIGFjdHVhbGx5IGJyaW5nIHRoZSBkYXRhIGJhY2suCgpgYGB7ciBlY2hvPVRSVUV9Cnhfd2luZyA8LSBnZXRfc3RhcnNoaXAoMTIpCnhfd2luZwpgYGAKCg==