Statistics 652: Homework
Homework 5:
(not collected)
- Read: Chapter 7 Neural Networks
- Read: Chapter 7 SVMs
- Read: Chapter 9 Clustering
- Read: Chapter 8 Transactional Data and Association Rules
Problems:
- Perform the ANN analysis on the concrete data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
- Organize you report using the Five Steps. The h2o code should not be run in an R Notebook.
- Perform the SVM analysis on the OCR analysis letter data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
- Organize you report using the Five Steps. The h2o code should not be run in an R Notebook.
- Perform the Cluster analysis on the sns data . Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
- Organize you report using the Five Steps.
- Perform the Association analysis on the groceries analysis letter data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
- Organize you report using the Five Steps.
Midterm and Final
The Midterm is about determining which algorithm is best for classifying passengers on the titanic for survival.
The Final will be about implementing a machine learning feature selection algorithm based on Random Forests called the Boruta Algorithm.
Unzip the R Project. See the Midterm-Final.pdf.
For the Midterm the process of developing a model using the training data is described. Final predictions will be made with the testing data that does not include the labels. This is how kaggle submissions are made.
Homework 4:
(due Monday March 8, 2021)
Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.
You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.
Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.
- Read:
- mdsr2e Chapter 10
- mdsr2e Chapter 11
- Machine Learning with R, Second Edition, Chapter 3 and 4. To access the book CSUEB Library Databases A-Z > Safari Books Online, register and access the book
- Problems:
- 11.7 Exercises: Problem 6a, Run Models 6. Naive Bayes, using training and test datasets, as described in part c of the problem.
- 11.7 Exercises: Problem 6b, Run Models 4. Random Forest, using training and test datasets, as described in part c of the problem.
- Perform the SMS spam filtering analysis from Lantz. Produce a report explaining the data, the analysis, and the findings.
- Organize you report using the Five Steps.
- Be sure to include:
1. Show the prediction that the algorithm produced.
2. Give the Accuracy of the predictions.
3. Include the confusion matrix.
Homework 3:
(due Monday March 1, 2021)
Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.
You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.
Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.
- Read:
- mdsr2e Chapter 10
- mdsr2e Chapter 11
- Machine Learning with R, Second Edition, Chapter 5. To access the book CSUEB Library Databases A-Z > Safari Books Online, register and access the book
- Problems:
- 11.7 Exercises: Problem 6a, Run Models 3. Decision Tree, using c5.0, 4. Random Forest, using training and test datasets, as described in part c of the problem.
Quiz:
(due Friday Feburary 19, 2021)
Complete 2.4 Exercises Problem 7 a, b, c from the ISL.
Do parts a, b, and c without normalization or scaling. Re-do parts a, b, and c using either noralization or scaling. Do the results differ?
Project:
(due TBA)
The class project is to develop the best classification model for the Loan Status of the LendingClub Approved Loans from 2012-2015 and to evaluate how well your best model classifies the Loan Status of the loan in 2015.
Submit your .Rmd and .docx or .pdf files. Do not submit a zipped directory containing the downloaded data.
During Week 4 you should complete Step 1 of the 5. Complete downloading the data and loading it into R. Remove the ID variable, if there is one(s), and make sure the columns have the appropriate types. In particular, make sure the target variable is a factor with two levels.
Here is a link to the data for download lending-club.zip
Hints:
Data for those who have limited computing resources.
Lead in the .csv file using different methods.
- read.csv Lending_Club_ver01.Rmd
- data.table Lending_Club_ver02.Rmd)
- arrow Lending_Club_ver03.Rmd)
Homework 2:
(due Monday Feburary 15, 2021)
Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.
You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.
Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.
- Read:
- mdsr2e Chapter 10
- mdsr2e Chapter 11
- Machine Learning with R, Second Edition, Chapter 3, 5, second half of Chapter 6. To access the book CSUEB Library Databases A-Z > Safari Books Online, register and access the book
- Problems:
- 10.6 Exercises: Problem 3
- 11.7 Exercises: Problem 6a, Run Models 1. Null Model, 2. Logistic Regression, 6. kNN, using training and test datasets, as described in part c of the problem.
- 11.7 Exercises: Problem 6b, Run Models 1. Null Model, 2. Multiple Linear Regression, 3. Decision Tree, using CART, from the R package rpart, using training and test datasets, as described in part c of the problem.
Hint: For Problems 6a and 6b, explore the dataset before attempting to fit the models. You will need to deal with the missing values before applying some or all of the models. Which models do not work with missing data?
Homework 1:
(due Monday Feburary 1, 2021)
Using the provided R Project, rename the file lastname_firstname_Stat652_Homework01. Rmd using your own last name and first name in the filename.
You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.
Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.
- Read:
- mdsr2e Chapter 9
- Machine Learning with R, Second Edition, first half of the Chapter 6. To access the book CSUEB Library Databases A-Z > Safari Books Online, register and access the book
- Problems:
- 9.9 Exercises: Problem 2, Problem 3
- 9.10 Supplemental exercises: Problem 2