--- title: "Correlation" author: "Prof. Eric A. Suess" date: "April 5, 2021" output: beamer_presentation: default ioslides_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Counting and correlating pairs of words - Tokenizing by n-gram is a useful way to explore pairs of adjacent words. - Tidy data is a useful structure for comparing between variables or grouping by rows, but it can be challenging to compare between rows: for example, to count the number of times that two words appear within the same document, or to see how correlated they are. - Most operations for finding pairwise counts or correlations need to turn the data into a **wide matrix** first. ## Wide Format to examine correlation ![](images/widyr.jpg) ## Phi Coeffient - "We may instead want to examine correlation among words, which indicates how often they appear together relative to how often they appear separately." - See the Wikipedia page about the [Phi Coefficient](https://en.wikipedia.org/wiki/Phi_coefficient)