--- title: "Welcome" author: "Prof. Eric A. Suess" date: "March 15, 2021" output: beamer_presentation: default ioslides_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Welcome Welcome to Stat 653 Statistical Natural Language Processing ## Terms - Dictionary, words, tokens - Bag-of-Words - Corpus, Documents - Document Term Matrix - Tidy Format, one-token-per-row - TF-IDF - Sentiment Analysis - Topic Modeling ## R packages Text Mining - tm - [quanteda](https://quanteda.io) - [tidytext](https://quanteda.io) ## R package for accessing Text data - [janeautenr](https://github.com/juliasilge/janeaustenr) - [gutenbergr](https://github.com/ropensci/gutenbergr) - [harrypotter](https://github.com/bradleyboehmke/harrypotter) ## R function for processing text data - unnest_tokens() - anti_join() remove stop words - count() ## Jane Austen Read Chapter 1 of our book and compare Jane Austen to the Bronte sisters and H.G. Wells. ## Harry Potter Install the [harrypotter](https://github.com/bradleyboehmke/harrypotter) R package from the author's github. Take a look at the books, tidying the words from each book, counting the words and comparing the uses of words in each book the overall rates. ## Sentiment Analysis There are several sentiment lexicons. - sentiments - AFINN - bing - nrc - try get_sentiments() for each ###