### Stat 652 Statistical Learning

Department of Statistics and Biostatistics

California State University, East Bay

Spring 2021

Course Description | Homework | Important Dates | Software |

Syllabus | Handouts | Links | |

Blackboard | podcasts | Data | Online Books |

### Week 7:

**Next Week:**On Monday there is no class, next week is Finals week for this course. I will hold my usual office hours on Monday and Wednesday next week. I will be available during class time, both Monday and Wednesday, in my Zoom Office Hours for questions about the Midterm-Final, the Project or any other assignments you are working on.**Midterm-Final:**The Midterm-Final may be turned in no later than the end of the day next week Sunday March 14.- On Monday this week we will complete the discuss of Rules based Decision Tree and Tuning, from last week. We will discuss Neural Networks and Clustering on Wednesday. If you are interested, in Chapter 7 of Lantz there is a discussion of SVMs and in Chapter 8 there is a discussion of Association Rules.
**Presentation:**images.pdf**Spotlight blog:**Deep Learning with MXNet**Spotlight blog:**Deep Learning in R**YouTube:**Neural Networks Demistified Video**Image, Translation, Speech:****Microsoft:**Cognitive Services Translator Speech API**Google:**Cloud Services TensorFlow CoLab**Amazon:**AWS Deep Learning SageMaker**h2o:**Deep Learning Deep Learning with R Tutorials**Nvidia:**NVidia's GPUs - The Engine of Deep Learning**Baidu:**Baidu Reseach**IBM:**IBM Watson

**Presentation:****Website Spotlight:**MNIST**Website Spotlight:**UCI MLR nminst dataset**kaggle Competition:**Digit Recognizer**Spotlight blog:**mxnet Handwritten Digits Classification Competition**Spotlight paper:**EMNIST**Website Spotlight:**EMNIST Dataset**Website Spotlight:**cedar**Website Spotlight:**IMAGENET**Website Spotlight:**Fashion NIST**Presentation:****Presentation:**Transactional Data and Association Rules**Spotlight book:**Principles of Econometrics**Spotlight book:**Hands on Machine Learning

### Week 6:

**Homework:**Homework 4 has been posted.**Midterm:**The Midterm has been posted. See the Homework link. The titanic data is used. See Rebecca Barter's blog post Tidymodels: tidy machine learning in R.**Final:**The Final has been posted. See the Homework link.**Presentation:**images.pdf**Books Spotlight:**wikibooks**Presentation:**- Tuning.html
- Tuning.pdf
- Tuning.Rmd
**R Project:**Chap11.zip**This code takes a long time to run. Try this after the Chap05 code.**

**Presentation:**- Rules.html
- Rules.pdf
- Rules.Rmd
- mushrooms.csv
**R Project:**Chap05.zip

**Software Spotlight:**BigML Are You Ready for Big Machine Learning?**Software Spotlight:**DataRobot**Presentation:**images.pdf**Presentation:****Notes:**BayesNotes.pdf- sms_spam.csv
- MLwR_v2_04.r
**R Project:**Chap04.zip**R Notebook:**NB.Rmd- What is a VCorpus? StackExchange
- tm Vignettes
**Hint:**Recall from class that some people running R on Windows had a fonts problem. To solve the problem we added a line to the code giving the third DTM to the first. Since all of the steps used to create the first DTM are also done for the third DTM.

```
> # compare the result
> sms_dtm
> sms_dtm2
> sms_dtm3
> sms_dtm <- sms_dtm3
```

**Homework Solutions:**TidyModels Examples: All of the code below is work in progress. I need to add my comments still to my notebooks. The code below is made available as an extra set of code using the new TidyModels package.**Further Examples:**This R Project contains new example that use the new Workflow package that is part of the tidymodels collection of packages.

### Week 5:

**Announcements:****Quiz:**The Quiz folder has been made in Blackboard.**Homework:**Homework 3 has been posted.**Midterm:**The Midterm will be a take-home midterm given next week on Wednesday. Due Wednesday March 3.**Presentation:**- LogisticRegression.html
- Logistic.pdf
- Logistic.Rmd
- MLwR_v2_06_Logistic.r
- challenger.csv
- credit.csv
**R Project:**Chap06.zip**updated Logistic Regression code, compare ROCs**

**Presentation:****Website Spotlight:**Rseek**Website Spotlight:**METACRAN**Website Spotlight:**RDocumentation**Website Spotlight:**rdrr.io

### Week 4:

**Quiz:**Quiz 1 has been posted under Homework.**Homework:**Homework 2 has been posted.**Project:**The class Project has been posted under Homework. I have posted a link for direct download the data, see Homework, where the Project is posted.**Presentation:**images.pdf**Website Spotlight:**UC Business Analytics R Programming Guide**Very Nice!!!****Website Spotlight:**UC-r Logistic Regression Tutorial**Website Spotlight:**UC-r Resampling Methods Tutorial**Spotlight blog:**How to perform Logistic Regression in R**Website Spotlight:**Generalized Linear Models**YouTube:**The tradeoff between Sensitivity and Specificity**YouTube:**ROC Curves Video**See minute 7.****DataCamp:**Bret Lantz Supervised Learning in R: Classification**DataCamp:**Gabriela Machine Learning with Tree-Based Models in R**DataCamp:**Nina John Supervised Learning in R: Regression**DataCamp:**Sergey Fogelson Extreme Gradient Boosting with XGBoost**YouTube:**GBM**Spotlight blog:**statistics.com Lift and Persuasion

### Week 3:

**Announcement:**CSU STUDENT RESEARCH COMPETITION (SRC)**Announcement:**CAL STATE EAST BAY STUDENT RESEARCH SYMPOSIUM**Homework:**Homework 2 has been posted.**Handouts:**The link to the class Jamboard has been added to the Handouts.**Presentation:**images.pdf**Presentation:****RStudio::global 2021:**rstudio::conf 2021**Spotlight blog:**A Gentle Introduction to Tidymodels**Spotlight Youtube:**Jelena Ilic: Modeling, Tidyverse Way

### Week 2:

**Presentation:****RNotebook:****Spotlight Software:****Spotlight Blog post:**Prime Hints For Running A Data Project In R**Spotlight blog:**Data Science Central**Spotlight blog:**The 10 Statistical Techniques Data Scientists Need to Master**RStudio::global 2021:**rstudio::global 2021**RStudio::conf 2019:**rstudio::conf 2019**Spotlight Books:**- The following presentations are based on the samples in the Lantz Machine Learning with R, Second Edition
**Presentation:****Website Spotlight:**UC-r Linear Regression Tutorial**Website Spotlight:**UC-r Linear Model Selection Tutorial**Spotlight blog:**Decision Treesâ€Š- An Intuitive Introduction

### Week 1:

**Book:**mdsr2e**Homework:**Homework 1 has been posted.**Presentation:****Spotlight Software:**

**Excellent References:**

**Machine Learning:**

- Hands-On Machine Learning with R
- tidymodels
- mlr3 book
- Introduction to Machine Learning (I2ML)
- Interpretable Machine Learning

**Data Science:**

- mdsr2e
- r4ds
- ModernDive
- YaRrr!
- R Data Science Essentials
- Python Data Science Essentials
- Doing Data Science
- Data Science from Scratch
- Data Driven (fast easy read)
- A Simple Introduction to Data Science

**Learning R:**

- Data Camp: Introduction to R
- Data Camp: Machine Learning with Tree-Based Models in R
- RProgramming.net
- Introduction to MRO
- R-Exercises
- R Markdown: The Definitive Guide

**Learning Python:**

**Learning SQL:**

**Reading related to the Digital Economy:**

- The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies
- Race Against the Machine
- Wired For Innovation
- Strategies for e-business success
- Understanding the Digital Economy

**Reading related to AI and ML for Marketing**

- AI for Marketing and Product Innovation: Powerful New Tools for Predicting Trends, Connecting with Customers, and Closing Sales
- machineVantage AI Videos

**More Big Picture:**

- Fourth Paradigm of Science: Data-Intensive Scientific Discovery
- McKinsey Global Institute Big Data: The next frontier for innovation, competition, and productivity
- leada The Data Analytics Handbook
- Data Analysts + Data Scientists
- CEO's + Managers
- Researchers + Academics
- Big Data Edition

- The Master Algorithm
- Pedro Domingos: "The Master Algorithm" | Talks at Google