June 15, 2020

Introduction

Today we will begin by finishing a few things we did not get to last Wednesday.

  1. Setting up an R Project that can be used to create an R Package. It is helpful, for understanding how R packages are used, to know how R Packages are created.
  2. Reviewing the lyft BayWheels "data wrangling" Challenge. Note the use of the bulkread R package and that issue with the data types. Integer vs Double, Character vs Factor, Logical vs Factor. The data wrangling is still not done!
  3. Merging data. In the nycflights13 data there are multiple tables. How to merge them together? Also, note that now we have an issue with the values of a variable being different.

Topics for this week

  • We will start with variable names and the janitor R package.
  • Next we start to formally learn about the Tidyverse verbs: Select, Filter, Mutate, Summarize, Joins, Rename, etc. which are part of dplyr.
  • We will introduce the data.table R package for bigger data, in memory, and the dtplyr R package to run dplyr commands on data.table objects.
  • We will introduce the disk.frame R package for bigger data, not in memory.
  • We will introduce the dbplyr R package for working with data in a database, such as SQLite. When data is written to a database it is usually stored on the hard drive of your computer and not in memory.

Databases

Other packages for data

  • anyflights Download data from any airport or time period.
  • airlines Build a database of the 30+ years of data.

  • Palmer Penguin alternative to Fisher's iris data.