---
title: "Welcome"
author: "Prof. Eric A. Suess"
date: "March 15, 2021"
output:
  beamer_presentation: default
  ioslides_presentation: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

## Welcome

Welcome to Stat 653 Statistical Natural Language Processing

## Terms

- Dictionary, words, tokens
- Bag-of-Words
- Corpus, Documents
- Document Term Matrix
- Tidy Format, one-token-per-row
- TF-IDF
- Sentiment Analysis
- Topic Modeling

## R packages Text Mining

- tm
- [quanteda](https://quanteda.io)
- [tidytext](https://quanteda.io)

## R package for accessing Text data 

- [janeautenr](https://github.com/juliasilge/janeaustenr)
- [gutenbergr](https://github.com/ropensci/gutenbergr)
- [harrypotter](https://github.com/bradleyboehmke/harrypotter)

## R function for processing text data

- unnest_tokens()
- anti_join()  remove stop words
- count()

## Jane Austen 

Read Chapter 1 of our book and compare Jane Austen to the Bronte sisters and H.G. Wells.

## Harry Potter

Install the [harrypotter](https://github.com/bradleyboehmke/harrypotter) R package from the author's github.  

Take a look at the books, tidying the words from each book, counting the words and comparing the uses of words in each book the overall rates.

## Sentiment Analysis

There are several sentiment lexicons.

- sentiments
- AFINN
- bing
- nrc
- try get_sentiments() for each

###