Chapter 4 Data Transformation

The 5 verbs of data wrangling

library(nycflights13)
library(tidyverse)

We will continue to work with the flights dataset that is in the ggplot2 package.

flights

Change the code from the Transformation presentation to using the pipe %>%. Note that when using pipes you do not include the data in the next function call, it is piped into the function. The functions in the tidyverse work this way.

filter()

flights %>% filter(month == 1, day == 1)

arrange()

flights %>% arrange(year, month, day)

arrange()

flights %>% arrange(desc(dep_delay))

select()

flights %>% select(year, month, day)

select()

flights %>% select(time_hour, air_time, everything())

mutate()

flights %>% select(year:day, ends_with("delay"), distance, air_time) %>%
   mutate(gain = dep_delay - arr_delay, speed = distance / air_time * 60)

summarize()

summarise(flights, delay = mean(dep_delay, na.rm = TRUE))
flights %>% group_by(year, month, day) %>%
    summarise(delay = mean(dep_delay, na.rm = TRUE))

Combining multiple operations with the pipe %>%

delay <- flights %>% group_by(dest) %>%
  summarise(count = n(), dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>%
  filter(count > 20, dest != "HNL")
delay

Combining multiple operations with the pipe %>%

delay %>% ggplot(mapping = aes(x = dist, y = delay)) +
  geom_point(aes(size = count), alpha = 1/3) +
  geom_smooth(se = FALSE)

Combining multiple operations with the pipe %>%

It looks like delays increase with distance up to ~750 miles and then decrease. Maybe as flights get longer there’s more ability to make up delays in the air?

geom_smooth() using method = ‘loess’ and formula ‘y ~ x’

Combining multiple operations with the pipe %>%

Does this code read better? This is the same code as above!

delays <- flights %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(count > 20, dest != "HNL")
LS0tCnRpdGxlOiAiVHJhbnNmb3JtYXRpb24gUGlwZXMgIgphdXRob3I6ICJQcm9mLiBFcmljIEEuIFN1ZXNzIgpvdXRwdXQ6CiAgd29yZF9kb2N1bWVudDogZGVmYXVsdAogIGh0bWxfZG9jdW1lbnQ6CiAgICBkZl9wcmludDogcGFnZWQKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKIyBDaGFwdGVyIDQgRGF0YSBUcmFuc2Zvcm1hdGlvbgoKVGhlIDUgdmVyYnMgb2YgZGF0YSB3cmFuZ2xpbmcKCi0gUGljayBvYnNlcnZhdGlvbnMgYnkgdGhlaXIgdmFsdWVzICgqKmZpbHRlcigpKiopLgotIFJlb3JkZXIgdGhlIHJvd3MgKCoqYXJyYW5nZSgpKiopLgotIFBpY2sgdmFyaWFibGVzIGJ5IHRoZWlyIG5hbWVzICgqKnNlbGVjdCgpKiopLgotIENyZWF0ZSBuZXcgdmFyaWFibGVzIHdpdGggZnVuY3Rpb25zIG9mIGV4aXN0aW5nIHZhcmlhYmxlcyAoKiptdXRhdGUoKSoqKS4KLSBDb2xsYXBzZSBtYW55IHZhbHVlcyBkb3duIHRvIGEgc2luZ2xlIHN1bW1hcnkgKCoqc3VtbWFyaXNlKCkqKikuCi0gKCoqZ3JvdXBfYnkoKSoqKQoKCmBgYHtyIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkobnljZmxpZ2h0czEzKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKYGBgCgpXZSB3aWxsIGNvbnRpbnVlIHRvIHdvcmsgd2l0aCB0aGUgKmZsaWdodHMqIGRhdGFzZXQgdGhhdCBpcyBpbiB0aGUgZ2dwbG90MiBwYWNrYWdlLgoKYGBge3J9CmZsaWdodHMKYGBgCgpDaGFuZ2UgdGhlIGNvZGUgZnJvbSB0aGUgVHJhbnNmb3JtYXRpb24gcHJlc2VudGF0aW9uIHRvIHVzaW5nIHRoZSBwaXBlICU+JS4gIE5vdGUgdGhhdCB3aGVuIHVzaW5nIHBpcGVzIHlvdSBkbyBub3QgaW5jbHVkZSB0aGUgZGF0YSBpbiB0aGUgbmV4dCBmdW5jdGlvbiBjYWxsLCBpdCBpcyBwaXBlZCBpbnRvIHRoZSBmdW5jdGlvbi4gIFRoZSBmdW5jdGlvbnMgaW4gdGhlIHRpZHl2ZXJzZSB3b3JrIHRoaXMgd2F5LgoKIyMgZmlsdGVyKCkKCmBgYHtyIGVjaG89VFJVRX0KZmxpZ2h0cyAlPiUgZmlsdGVyKG1vbnRoID09IDEsIGRheSA9PSAxKQpgYGAKCiMjIGFycmFuZ2UoKQoKYGBge3IgZWNobz1UUlVFfQpmbGlnaHRzICU+JSBhcnJhbmdlKHllYXIsIG1vbnRoLCBkYXkpCmBgYAoKIyMgYXJyYW5nZSgpCgpgYGB7ciBlY2hvPVRSVUV9CmZsaWdodHMgJT4lIGFycmFuZ2UoZGVzYyhkZXBfZGVsYXkpKQpgYGAKCiMjIHNlbGVjdCgpCgpgYGB7ciBlY2hvPVRSVUV9CmZsaWdodHMgJT4lIHNlbGVjdCh5ZWFyLCBtb250aCwgZGF5KQpgYGAKCiMjIHNlbGVjdCgpCgpgYGB7ciBlY2hvPVRSVUV9CmZsaWdodHMgJT4lIHNlbGVjdCh0aW1lX2hvdXIsIGFpcl90aW1lLCBldmVyeXRoaW5nKCkpCmBgYAoKIyMgbXV0YXRlKCkKCmBgYHtyIGVjaG89VFJVRX0KZmxpZ2h0cyAlPiUgc2VsZWN0KHllYXI6ZGF5LCBlbmRzX3dpdGgoImRlbGF5IiksIGRpc3RhbmNlLCBhaXJfdGltZSkgJT4lCiAgIG11dGF0ZShnYWluID0gZGVwX2RlbGF5IC0gYXJyX2RlbGF5LCBzcGVlZCA9IGRpc3RhbmNlIC8gYWlyX3RpbWUgKiA2MCkKYGBgCgojIyBzdW1tYXJpemUoKQoKYGBge3IgZWNobz1UUlVFfQpzdW1tYXJpc2UoZmxpZ2h0cywgZGVsYXkgPSBtZWFuKGRlcF9kZWxheSwgbmEucm0gPSBUUlVFKSkKCmZsaWdodHMgJT4lIGdyb3VwX2J5KHllYXIsIG1vbnRoLCBkYXkpICU+JQogICAgc3VtbWFyaXNlKGRlbGF5ID0gbWVhbihkZXBfZGVsYXksIG5hLnJtID0gVFJVRSkpCgpgYGAKCiMjIENvbWJpbmluZyBtdWx0aXBsZSBvcGVyYXRpb25zIHdpdGggdGhlIHBpcGUgJT4lCgpgYGB7ciBlY2hvPVRSVUV9CmRlbGF5IDwtIGZsaWdodHMgJT4lIGdyb3VwX2J5KGRlc3QpICU+JQogIHN1bW1hcmlzZShjb3VudCA9IG4oKSwgZGlzdCA9IG1lYW4oZGlzdGFuY2UsIG5hLnJtID0gVFJVRSksCiAgICBkZWxheSA9IG1lYW4oYXJyX2RlbGF5LCBuYS5ybSA9IFRSVUUpCiAgKSAlPiUKICBmaWx0ZXIoY291bnQgPiAyMCwgZGVzdCAhPSAiSE5MIikKZGVsYXkKYGBgCgojIyBDb21iaW5pbmcgbXVsdGlwbGUgb3BlcmF0aW9ucyB3aXRoIHRoZSBwaXBlICU+JQoKYGBge3IgZWNobz1UUlVFLCBldmFsPUZBTFNFfQpkZWxheSAlPiUgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGRpc3QsIHkgPSBkZWxheSkpICsKICBnZW9tX3BvaW50KGFlcyhzaXplID0gY291bnQpLCBhbHBoYSA9IDEvMykgKwogIGdlb21fc21vb3RoKHNlID0gRkFMU0UpCmBgYAoKIyMgQ29tYmluaW5nIG11bHRpcGxlIG9wZXJhdGlvbnMgd2l0aCB0aGUgcGlwZSAlPiUKCkl0IGxvb2tzIGxpa2UgZGVsYXlzIGluY3JlYXNlIHdpdGggZGlzdGFuY2UgdXAgdG8gfjc1MCBtaWxlcyAKYW5kIHRoZW4gZGVjcmVhc2UuIE1heWJlIGFzIGZsaWdodHMgZ2V0IGxvbmdlciB0aGVyZSdzIG1vcmUgCmFiaWxpdHkgdG8gbWFrZSB1cCBkZWxheXMgaW4gdGhlIGFpcj8KCj4gYGdlb21fc21vb3RoKClgIHVzaW5nIG1ldGhvZCA9ICdsb2VzcycgYW5kIGZvcm11bGEgJ3kgfiB4JwoKIyMgQ29tYmluaW5nIG11bHRpcGxlIG9wZXJhdGlvbnMgd2l0aCB0aGUgcGlwZSAlPiUKCkRvZXMgdGhpcyBjb2RlIHJlYWQgYmV0dGVyPyAgVGhpcyBpcyB0aGUgc2FtZSBjb2RlIGFzIGFib3ZlIQoKYGBge3IgZWNobz1UUlVFfQpkZWxheXMgPC0gZmxpZ2h0cyAlPiUgCiAgZ3JvdXBfYnkoZGVzdCkgJT4lIAogIHN1bW1hcmlzZSgKICAgIGNvdW50ID0gbigpLAogICAgZGlzdCA9IG1lYW4oZGlzdGFuY2UsIG5hLnJtID0gVFJVRSksCiAgICBkZWxheSA9IG1lYW4oYXJyX2RlbGF5LCBuYS5ybSA9IFRSVUUpCiAgKSAlPiUgCiAgZmlsdGVyKGNvdW50ID4gMjAsIGRlc3QgIT0gIkhOTCIpCmBgYAoKCgo=