Statistics 652: Homework


Homework 5:

(not collected)

Problems:

  1. Perform the ANN analysis on the concrete data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
    • Organize you report using the Five Steps. The h2o code should not be run in an R Notebook.
  2. Perform the SVM analysis on the OCR analysis letter data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
    • Organize you report using the Five Steps. The h2o code should not be run in an R Notebook.
  3. Perform the Cluster analysis on the sns data . Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
    • Organize you report using the Five Steps.
  4. Perform the Association analysis on the groceries analysis letter data. Produce a report explaining the data, the analysis, and the findings. Using an Rnotebook.
    • Organize you report using the Five Steps.

Midterm and Final

The Midterm is about determining which algorithm is best for classifying passengers on the titanic for survival.

The Final will be about implementing a machine learning feature selection algorithm based on Random Forests called the Boruta Algorithm.

Unzip the R Project. See the Midterm-Final.pdf.

For the Midterm the process of developing a model using the training data is described. Final predictions will be made with the testing data that does not include the labels. This is how kaggle submissions are made.


Homework 4:

(due Monday March 8, 2021)

Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.

Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.


Homework 3:

(due Monday March 1, 2021)

Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.

Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.


Quiz:

(due Friday Feburary 19, 2021)

Complete 2.4 Exercises Problem 7 a, b, c from the ISL.

Do parts a, b, and c without normalization or scaling. Re-do parts a, b, and c using either noralization or scaling. Do the results differ?


Project:

(due TBA)

The class project is to develop the best classification model for the Loan Status of the LendingClub Approved Loans from 2012-2015 and to evaluate how well your best model classifies the Loan Status of the loan in 2015.

Submit your .Rmd and .docx or .pdf files. Do not submit a zipped directory containing the downloaded data.

During Week 4 you should complete Step 1 of the 5. Complete downloading the data and loading it into R. Remove the ID variable, if there is one(s), and make sure the columns have the appropriate types. In particular, make sure the target variable is a factor with two levels.

Here is a link to the data for download lending-club.zip

Hints:

Data for those who have limited computing resources.

Lead in the .csv file using different methods.


Homework 2:

(due Monday Feburary 15, 2021)

Using the provided R Project, rename the file lastname_firstname_Stat652_Homework02.Rmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.

Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.

Hint: For Problems 6a and 6b, explore the dataset before attempting to fit the models. You will need to deal with the missing values before applying some or all of the models. Which models do not work with missing data?


Homework 1:

(due Monday Feburary 1, 2021)

Using the provided R Project, rename the file lastname_firstname_Stat652_Homework01. Rmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Blackboard.

Upload two files to Blackboard. Your .pdf or .doc and your .Rmd files.