---
title: "Review Stat. 653"
author: "Prof. Eric A. Suess"
format: revealjs
---

## Review

What have we learned about studying Text Mining?

A lot of new things about how to work with text data.

## Review

- Bag-of-Words Model
- Tokenizers
- Corpus and Document Term Matrix
- Stemming
- Word Counts
- TF-IDF
- Sentiment Analysis
- n-grams
- Pairwise Correlation of words
- regex
- Topic Modeling
- Parts of Speech

## Review

All of these ideas are **as useful** as computing means, standard deviation, correlations, t-tests, regression, etc. for numeric data.

You are now prepared to work with the other half of the data that is out there in the world!

## Review

We have studied *Unsupervised Learning* techniques for text based data.

**Sentiment Analysis** is very useful for learning about the sentiment in documents.  

## Review

We have studied *Unsupervised Learning* techniques for *clustering* text based data.

**Topic Analysis** is very useful for learning about the different topics discussed in documents.

## Review

We have studied *Supervised Learning* techniques for *classifying* text based data.

**Naive Bayes** and **Logistic Regression with lasso/regularization** are very useful for predicting which class documents are in.

## Review

There are a lot of other R packages that can be used for Text Mining.

- Rvest
- Quanteda
- Text2vec
- Spacy
- Rtweet

## Review

There are a lot of Python packages that can be used for Text Mining.

- NLTK
- Textblob
- SciKit Learn
- Beautiful soup
- Gensim
- Spacy
- CoreNLP

## Review

There are growing opportunities to work doing Text Mining.  This is a very interesting new field to work in and there are a growing number of excellent tools available to pursue such work.