Stat 652: Statistical Learning
Department of Statistics and Biostatistics, CSU East Bay
Spring 2025:
Week 8:
- This Week: On Monday March 10 and Wednesday 12 next week there is no class, next week is Finals week for this course. I will hold my usual office hours on Monday and Thursday next week for questions about the Final, the Project or any other assignments you are working on.
- Student Evaluations: Please fill out the student evaluations for the class.
- Meetings:
Week 7:
- Final: The Project and Final may be turned in no later than the end of the day Sunday March 16.
- On Monday this week we will complete the discussion of Rules based Decision Tree and Tuning, from last week. We will discuss Neural Networks and Clustering on Wednesday. If you are interested, in Chapter 7 of Lantz 4ed there is a discussion of SVMs and in Chapter 8 there is a discussion of Association Rules.
- Presentation: images.pdf
- ANN.html
- ANN.qmd
- MLwR4e Chapter_07.R
- MLwR_v2_07.r
- MLwR_v2_07_h2o.r
- concrete.csv
- R Project: Chap07.zip
- Spotlight blog: Deep Learning in R
- YouTube: Neural Networks Demistified Video
- Image, Translation, Speech:
- Microsoft: Cognitive Services Translator Speech API [CoPilot]
- Google: Cloud Services CoLab AIStudio Google
- Amazon: AWS Deep Learning SageMaker
- h2o: Deep Learning Deep Learning with R Tutorials
- Nvidia: NVidia’s GPUs - The Engine of Deep Learning
- Baidu: Baidu Reseach
- Tensorflow
- PyTorch
- Presentation:
- SVM.html
- SVM.qmd
- MLwR4e Chapter_07.R
- MLwR_v2_07.r
- MLwR_v2_07_h2o.r
- letterdata.csv
- R Project: Chap07.zip
- Website Spotlight: UCI MLR nminst dataset
- kaggle Competition: Digit Recognizer
- Spotlight paper: EMNIST
- Website Spotlight: EMNIST Dataset
- Website Spotlight: cedar
- Website Spotlight: IMAGENET
- Website Spotlight: Fashion NIST
- Presentation:
- Presentation: Transactional Data and Association Rules
- Association.html
- Association.qmd
- MLwR4e Chapter_08.R
- MLwR_v2_08.r
- groceries.csv
- R Project: Chap08.zip
- Spotlight book: Hands on Machine Learning
- Spotlight book: Dive into Deep Learning
Week 6:
- Homework: Homework 4 has been posted.
- Handout: The Five Steps: steps01.docx
- Presentation: images.pdf
- Further Examples: This R Project contains new example that use the new Workflow package that is part of the tidymodels collection of packages.
- Presentation: Boosting, Bagging, Random Forests
- Tuning.html
- Tuning.qmd
- R Project: Chap11.zip This code takes a long time to run. Try this after the Chap05 code.
- Presentation:
- Rules.html
- Rules.qmd
- mushrooms.csv
- R Project: Chap05.zip
- Software Spotlight: BigML Are You Ready for Big Machine Learning?
- Software Spotlight: DataRobot
Week 5:
- Monday is not a University holiday, we will have class on Monday.
- The Quiz due date has been updated, Friday February 21, 2025.
- Homework: Homework 3 has been posted.
- Midterm: The Midterm has been posted. See the Homework link. The titanic data is used.
- See Rebecca Barter’s blog post Tidymodels: tidy machine learning in R.
- See Olivier Gimenez’s blog post Experimenting with machine learning in R with tidymodels and the Kaggle titanic dataset.
- See Jan Kirenz blog post Data Science with Tidymodels, Workflows and Recipes
- See Rebecca Barter’s blog post Tidymodels: tidy machine learning in R.
- Presentation:
- Quarto Project: Chap06.zip updated Logistic Regression code, compare ROCs
- Presentation:
- Website Spotlight: Rseek
- Website Spotlight: METACRAN
- Website Spotlight: RDocumentation
- Website Spotlight: rdrr.io
- Presentation: images.pdf
- Presentation:
- Naive Bayes SMS spam filtering.html
- Naive Bayes SMS spam filtering.qmd
- Notes: BayesNotes.pdf
- sms_spam.csv
- MLwR4e Chapter_04.R
- MLwR_v2_04.r
- R Project: Chap04.zip
- R Notebook: NB.Rmd
- What is a VCorpus? StackExchange
- tm Vignettes
- Hint: Recall from class that some people running R on Windows had a fonts problem. To solve the problem we added a line to the code giving the third DTM to the first. Since all of the steps used to create the first DTM are also done for the third DTM.
- compare the result
- sms_dtm
- sms_dtm2
- sms_dtm3
- sms_dtm <- sms_dtm3
- Homework Solutions: TidyModels Examples: All of the code below is work in progress. I need to add my comments still to my notebooks. The code below is made available as an extra set of code using the new TidyModels package.
- TidyModels
- Spotlight blog: A Gentle Introduction to tidymodels
- Books Spotlight: wikibooks
Week 4:
- This week we will start by running the code from last week for Linear Regression and kNN.
- Homework: Homework 2b has been posted.
- Before starting to run any ML algorithms on the NHANES data you should investigate what is in the dataset. In particular which variables are numeric and which ones are categorical. You should also check to see how much data is missing from each variable.
- NHANES_ver01.qmd
- Quiz: Quiz 1 has been posted under Assignments.
- Datasets:
- Presentation: images.pdf
- CART.html
- CART.qmd
- MLwR4e Chapter_06.R
- MLwR_v2_06.r
- whitewines.csv
- redwines.csv
- Quarto Project: Chap06.zip
- Spotlight blog: Beginner’s guide to machine learning in R (with step-by-step tutorial)
- YouTube: R package reviews | glmulti | Find The Best Model !
- Spotlight Software: Tidyverse
- Spotlight Software: Tidymodels rsample
- Spotlight Software: caret
- Spotlight Software: easytats report performance
- Website Spotlight: UC Business Analytics R Programming Guide Very Nice!!!
- Website Spotlight: UC-r Logistic Regression Tutorial
- Website Spotlight: UC-r Resampling Methods Tutorial
- Spotlight blog: How to perform Logistic Regression in R
- Website Spotlight: Generalized Linear Models
- YouTube: The tradeoff between Sensitivity and Specificity
- YouTube: ROC Curves Video See minute 7.
- DataCamp: Bret Lantz Supervised Learning in R: Classification
- DataCamp: Machine Learning with Tree-Based Models in R
- DataCamp: Sergey Fogelson Extreme Gradient Boosting with XGBoost
- YouTube: GBM
Week 3:
- Homework: Homework 2a has been posted.
- The following presentations are based on the chapters in the Lantz Machine Learning with R, Fourth Edition
- Presentation: images.pdf
- Regression.html
- Regression.qmd
- MLwR4e Chapter_06.R
- MLwR_v2_06.r
- challenger.csv
- insurance.csv
- Quarto Project: Chap06.zip
- Website Spotlight: UC-r Linear Regression Tutorial
- Website Spotlight: UC-r Linear Model Selection Tutorial
- Spotlight blog: Decision Trees - An Intuitive Introduction
- Presentation: images.pdf
- Presentation:
- kNN_diagnosing_breast_cancer.html
- kNN_diagnosing_breast_cancer.qmd
- wisc_bc_data.csv
- MLwR4e Chapter_03.R
- MLwR_v2_03.r
- Quarto Project: Chap03.zip
- R Notebook: Chap03-RNotebook
- Presentation: An example of running knn in the Tidyverse using the Tidymodels package.
- Spotlight blog: Tidymodels: tidy machine learning in R
- Spotlight Youtube: Jelena Ilic: Modeling, Tidyverse Way
Week 2:
- Assignment: Homework 1 has been posted. Download the .zip file into your class directory and Homework sub-directory, rename the .qmd file with your lastname and firstname, and add your name in the yaml header.
- Presentation:
- Quarto Notebook:
- Quarto Notebook:
- Ch09 Statistical Foundations.html using the infer R package
- Ch09 Statistical Foundations.qmd
- Spotlight Software:
- Spotlight Blog post: Prime Hints For Running A Data Project In R
- Spotlight blog: Data Science Central
- Spotlight blog: The 10 Statistical Techniques Data Scientists Need to Master
- Spotlight blog: MACHINE LEARNING TRENDS IN 2023
- Spotlight blog: 8 AI and machine learning trends to watch in 2025
- Spotlight blog: 2023 emerging AI and Machine Learning trends
- RStudio::global 2024: rstudio::conf 2024
- RStudio::global 2023: rstudio::conf 2023
- RStudio::global 2022: rstudio::conf 2022
- RStudio::global 2021: rstudio::global 2021
- Spotlight Books:
Week 1:
- Book: mdsr3e
- Assignment: Homework 1 has been posted.
- Presentation:
- Spotlight Software:
Week 0:
Learning R:
Learning Python:
Learn SQL:
Excellent References:
Data Science:
- Socviz
- r4ds
- ModernDive
- Yarrr!
- R Data Science Essentials
- Python Data Science Essentials
- Deep Learning Made Easy with R
- Doing Data Science
- Data Science from Scratch
- What is Data Science? (fast easy read)
- Ethics and Data Science (fast easy read)
- Data Driven (fast easy read)
- R Markdown: The Definitive Guide
Reading related to the Digital Economy:
- The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies
- Race Against the Machine
- Wired For Innovation
- Strategies for e-business success
- Understanding the Digital Economy
More Big Picture:
- Fourth Paradigm of Science: Data-Intensive Scientific Discovery
- McKinsey Global Institute Big Data: The next frontier for innovation, competition, and productivity