Tidy data is long and narrow.
Each row is an observation or case.
Each column is a variable.
Examples:
- gapmider
- babynames
September 14, 2020
Tidy data is long and narrow.
Each row is an observation or case.
Each column is a variable.
Examples:
Being tidy does not mean neat.
All columns need to contain data from the variable in that column only.
All rows contain data from that specific observation in that row only.
Each column in a tidy dataset contains a variable.
Each variable is either categorical or numeric.
Categorical variables are often stored as factors in R.
Codebooks were common when variable names were short due to computer memory restrictions.
Now days variable names can be as long as you need. So codebooks are less important.
> ??gapmider > ?babynames
mutate()
spread()
gather()
These functions have been updated to the new functions.
See the tidyr website.
pivot_longer()
pivot_wider()
Basically all data that is available through and API is in a tidy format.
Take a look at the variable names in the R datasets.