Stat 452 Introduction to Statistical Learning
Department of Statistics and Biostatistics
California State University, East Bay
Spring 2020
Course Description  Homework  Important Dates  Software 
Syllabus  Handouts  Links  
Blackboard  podcasts  Data  Online Books 
Week 8: Finals week
 I will have my usual office hours MW 23pm.
 The scheduled final for the class is on Monday 8  10 PM. I will be online starting at 8pm and will be on until all questions have been answered. I will log into Office Hours.
Week 15:
 Quiz: I listed Wednesday for the quiz. So we will have the quiz next time. The quiz will be a short list of questions that will be useful for the final.
 Final: The final for the class will be takehome. You can complete it at home individually. I will be available during the scheduled exam time for questions. Final topics: Ch. 6 Trees, Random Forests, Ch. 7 ANN and SVM, Ch. 8 Market Basket Analysis, Ch. 9 Clustering. You should be able to communicate clearly what the Holdout Rule is and what a prediction is.
 Project: Your project should be completed before the end of the quarter. The last day you can submit it is Friday May 15.
 Presentation: Transactional Data and Association Rules Association Association.pdf
 MLwRv2_08.r
 groceries.csv
 R Project: Chap8.zip
 Presentation: Clustering Clustering.pdf
 Review and other topics
 Presentation: Validation Validation.pdf
 Presentation: Ensembles Ensembles.pdf
 Presentation: Selection Selection.pdf
 Project: Just as a reminder, you should randomize your data using the sample() function. Taking the top part of your data for training and using the bottom for testing is generally a bad idea. The author of our book does this to simplify things, but he claims to have prerandomized the rows of the datasets. Either prerandomize and say you did that or use the sample() function to randomly select the training and test data.
 Presentation:
Week 14:
 Quiz: There will be a quiz on Wednesday on terminology.
 R Project: Chap7.zip Updated: Now with h2o code working. Using a different activation function.
 h2O Deep Learning Booklet: Deep Learning Booklet
 h2O Deep Learning Tutorial: h2o Deep Learning Tutorial
 h2O Performance Tuning Guide: h2o Preformance Guide
 Presentation: SVM SVM.pdf images.pdf
 MLwR_v2_07.r
 MLwR_v2_07_h2o.r
 letterdata.csv
 R Project: Chap7.zip Updated: Now with h2o code working. Using a different activation function.
 Website Spotlight: MNIST
 Website Spotlight: UCI MLR nminst dataset
 kaggle Competition: Digit Recognizer
 Spotlight paper: EMNIST
 Website Spotlight: EMNIST Dataset
 Spotlight Website: cedar
 Website Spotlight: IMAGENET
 Website Spotlight:
Week 13:
 Resume Suggestion: Considering everything these days it might be good to add line to your resume. "Willing and able to work remotely."
 ASA: ASA Virtual Undergraduate Career Fair
 Homework: Homework 8 has been posted.
 Presentation: ANN ANN.pdf images.pdf
 MLwR_v2_07.r
 MLwR_v2_07_h2o.r
 concrete.csv
 R Project: Chap7.zip
 Spotlight blog: Deep Learning with MXNet
 Spotlight blog: Deep Learning in R
 YouTube: 3Blue1Brown
 YouTube: Neural Networks Demistified Video
 Image, Translation, Speech:
 Microsoft: Cognitive Services Translator Speech API
 Google: Cloud Services TensorFlow
 Amazon: AWS Deep Learning
 h2o: Deep Learning Deep Learning with R Tutorials
 Nvidia: NVidia's GPUs  The Engine of Deep Learning
 Baidu: Baidu Reseach
 IBM: IBM Watson
 Spotlight YouTube:
 sentdex: Machine Learning
 Google Developers: {ML} Recipes
Week 12:
 Homework: Homework 7 has been posted.
 Project: The first part of the Project has been posted.
 Blackboard: I plan to start a Discussion Forum for each class going forward. Please give it a try. I would like to collect everyone's questions through the Discussion Forum so everyone can see the questions being asked and the answers.
 Presentation: Evaluation of Logistic Regression Models (This includes more advances topics related to Logisitic Regression. If you are considering applying to an MS Statistics program you might take a look at this more closely. This is material would be covered in a Categorical Data class.)
 Presentation: Tuning Tuning.pdf
 MLwR_v2_11.r
 credit.csv
 R Project: Chap11.zip
 Presentation: Predicting wine quality using Random Forests
 Website Spotlight: Netflix Prize
 Website Spotlight: kaggle Titanic Google Cloud & YouTube  8M Video Understanding
 Website Spotlight: Numerai
 Website Spotlight: Quantopian
 Website Spotlight: Data Science Game
Week 11:
 Homework: Homework 6 has been posted.
 Presentation: CART CART.pdf
 MLwR_v2_06.r
 whitewines.csv
 redwines.csv
 R Project: Chap06.zip
 Presentation: LogisticRegression Logistic.pdf
 Spotlight blog: How to perform Logistic Regression in R
 Presentation: Evaluation Evaluation.pdf
 MLwR_v2_06_Logistic.r
 challenger.csv
 credit.csv
 R Project: Chap06.zip
 Spotlight blog: Evaluating Logistic Regression Models
 Website Spotlight: R Data Analysis Examples: Logit Regression
 Website Spotlight: Generalized Linear Models
 YouTube: ROC Curves Video
 YouTube: The tradeoff between Sensitivity and Specificity
Week 10:
 Spring Break: Next week is Spring Break so there will be no class on Monday and Wednesday next week.
 Homework: Homework 5 has been posted.
 Software Spotlight: RStudio Cloud Instead of dealing with Java (on Windows you need the Windows Offline (64 bit) version) to run rWeka, you could use the RStudio Cloud.
 Presentation: Regression.html Regression.pdf images.pdf
 MLwR_v2_06.r
 challenger.csv
 insurance.csv
 R Project: Chap06.zip
 Website Spotlight: UCr Linear Regression Tutorial
 Website Spotlight: UCr Linear Model Selection Tutorial
 Software Spotlight: Salford Systems Minitab
 Software Spotlight: Togaware onepager
Week 9:
 Office Hours: Office Hours Monday from 23pm are canelled. We are interviewing a candidate for a tenure track faculty position.
 This week we will be starting our online class meetings. In preparation for these meetings. Please get Zoom working on your computer. Here is the link to the csueb Zoom login page. I would also suggest having R and RStudio installed on the computer you plan to be using for class.
 We will go over the Midterm in class on Monday.
 Homework: Homework 4 has been postponed.
 Finish going over Chapter 5 code.
 Presentation: Rules.html Rules.pdf images.pdf
 MLwR_v2_05_update2.r The update includes the use of training and test data with the application of the Rules based methods.
 mushrooms.csv
 R Project: Chap05.zip
 R Notebook: Chap05.Rmd
Week 8:
 The university is now closed for the rest of the week, March 1115. THE LATEST INFORMATION ON CORONAVIRUS (COVID19)
 There will be no class today, March 11, 2020.
 We will have class on Monday as regularly scheduled. The class discussion and office hours will be conducted using Zoom Video Conference. Please give Zoom Video Conference a try before Monday. To join the Zoom Video Conference log into Blackboard, go to the class and then click on the Video Conference link on the left, and then click on the first link Zoom Video Conference. You will need a microphone and headphones to be able to speak and hear. Today will also be a test for me to experience class and office hours online. Today I will be in my office hour online from 2 to 3pm. See you online! Lets give it a try.

On Monday we will start by discussing how we will proceed until Wednesday April 8 when the decision to continue, or not, online will be decided. Then we will go over the Midterm solution and then try the Classification Tree on the credit data.

Homework: Homework 4 has been posted.
 Midterm: I plan to discuss the midterm solutions on Wednesday in class.
 Presentation: Classification Classification.pdf images.pdf
 Presentation: Classification2 Classification2.pdf
 MLwR_v2_05_update2.r
 credit.csv
 R Project: Chap05.zip
 R Notebook: Chap05.Rmd
 Website Spotlight: UCr Regression Trees Tutorial
 Books Spotlight: wikibooks
 Software Spotlight: BigML Are You Ready for Big Machine Learning?
 Software Spotlight: DataRobot
Week 7:
 On Monday we will finish our discussion of the Naive Bayes code and we will go over the Quiz solution in preparation for the Midterm on Wednesday. The Quiz solution has been posted in Blackboard.
 Midterm: The Midterm will be on Wednesday in class. The Midterm will cover Chapters 14.
 Homework: Homework 4 has been posted. The material for this Homework will be presented after the Midterm, next week.
Week 6:
 Quiz: There will be a quiz in class on Wednesday February 26 covering Chapters 14.
 On Monday we will continue our discussion of Naive Bayes and run the R code from last week. Chap04.zip
 Guest Speaker: The Data Science Workgroup Meeting on Friday February 28, 2:10 to 3pm, we will have Darren Keeley, from PG&E, speaking about his experiences there doing data visualization and other things. flyer
Week 5:
 Office Hours: Cancelled today Unfortunately I have a department meeting during my office hours today. If you have time please come to the talk at 4pm that will be given by a candidate for a tenure track faculty position in our department.
 Homework: Homework 3 has been postponed until next week.
 Quiz: Next week, there will be a quiz in class on Wednesday February 26 covering Chapters 14.
 Presentation: Naive Bayes.html Naive Bayes.pdf images.pdf
 Presentation: Naive Bayes SMS spam filtering Naive Bayes SMS spam filtering.pdf
 Notes: BayesNotes.pdf
 sms_spam.csv
 MLwR_v2_04.r
 R Project: Chap04.zip
 R Notebook: NB.Rmd
 What is a VCorpus? StackExchange
 tm Vignettes
 Hint: Recall from class that some people running R on Windows had a fonts problem. To solve the problem we added a line to the code giving the third DTM to the first. Since all of the steps used to create the first DTM are also done for the third DTM.
> # compare the result
> sms_dtm
> sms_dtm2
> sms_dtm3
> sms_dtm < sms_dtm3
 Website Spotlight: UCr Naive Bayes Tutorial
 Website Spotlight: GormAnalysis  Introduction to Naive Bayes
 Website Spotlight: Rseek
 Website Spotlight: METACRAN
 Website Spotlight: crantastic!
 Website Spotlight: RDocumentation
Week 4:
 Homework: Homework 3 has been posted.
 LibCal: LibCal Learn Excel.
 Presentation: kNN_diagnosing_breast_cancer kNN_diagnosing_breast_cancer.pdf
 wisc_bc_data.csv
 MLwR_v2_03.r
 R Project: Chap03.zip
 R Notebook: Chap03RNotebook
Week 3:
 Announcement: CSU STUDENT RESEARCH COMPETITION (SRC)
 Announcement: CAL STATE EAST BAY STUDENT RESEARCH SYMPOSIUM
 Homework: Homework 2 has been posted.
 Presentation: kNN kNN.pdf images.pdf
Week 2:
 Presentation: IntroR IntroR.pdf
 usedcars.csv
 MLwR_v2_02.r
 R Project: Chap02.zip
 Rpubs: https://rpubs.com/esuess/Chap02
 Spotlight blog: Analytics Vidhya
 Spotlight blog: A Complete Tutorial to learn Data Science in R from Scratch
 Spotlight blog: Machine Learning basics for a newbie
 Spotlight blog: Essentials of Machine Learning Algorithms (with Python and R code)
 Spotlight blog: Introduction to knearest neighbors: Simplified
 Spotlight blog: New Year Resolutions for a Data Scientist
 Spotlight blog: 25 Open Datasets for Deep Learning Every Data Scientist Must Work With
 Book Spotlight: QuickR
 Book Spotlight: RDM
 Book Spotlight: StatSoft Textbook kNN Naive Bayes Classifier
 Book Spotlight: Tom Mitchell Chapter 3
 Book Spotlight: Hands On Machine Learning with R
 Website Spotlight: caret parsnip
 Website Spotlight: scikit learn Naive Bayes
 Spotlight blog: 6 Easy Steps to Learn Naive Bayes Algoithm
 Cheat Sheets:
 Google Machine Learning course: Machine Learning Crash Course
 New AI course: fast.ai
Week 1:
 Or you can read the book through the CSUEB library > Data Batabases > Safari Books Online
 Book images images.pdf
 Presentation: Welcome Welcome.pdf
 Homework: Homework 1 has been posted.
 Spotlight Software:
 Spotlight Software: python Anaconda jupyter scikitlearn
 Spotlight Software: TensorFlow Keras RStudio Tensorflow RStudio Keras h2O.ai
 Spotlight Blog post: Prime Hints For Running A Data Project In R
Excellent References:
Machine Learning:
Data Science:
 r4ds
 ModernDive
 YaRrr!
 R Data Science Essentials
 Python Data Science Essentials
 Doing Data Science
 Data Science from Scratch
 What is Data Science? (fast easy read)
 Data Driven (fast easy read)
 A Simple Introduction to Data Science
Learning R:
 Data Camp: Introduction to R
 Data Camp: Machine Learning with TreeBased Models in R
 RProgramming.net
 Introduction to MRO
 RExercises
 R Markdown: The Definitive Guide
Learning Python:
Learning SQL:
Reading related to the Digital Economy:
 The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies
 Race Against the Machine
 Wired For Innovation
 Strategies for ebusiness success
 Understanding the Digital Economy
 Introduction to AI for Marketing and Product Innovation
More Big Picture:
 Fourth Paradigm of Science: DataIntensive Scientific Discovery
 McKinsey Global Institute Big Data: The next frontier for innovation, competition, and productivity
 leada The Data Analytics Handbook
 Data Analysts + Data Scientists
 CEO's + Managers
 Researchers + Academics
 Big Data Edition
 The Master Algorithm
 Pedro Domingos: "The Master Algorithm"  Talks at Google