Assignments


Homework 5: (not collected)

Read the following Chapters in Machine Learning with R, 4ed, Lantz.


Final: (due in Canvas by Sunday March 10, 2024 or end of finals week.)

The Final will be about implementing a machine learning feature selection algorithm based on Random Forests called the Boruta Algorithm.

Unzip the R Project. See the Final_part04.html.

Answer two questions in the Final_part04.html file. The questions are:

  1. What are the important variables identified by the Boruta algorithm from the Ozone data?
  2. What are the important variables identified by the Boruta algorithm from the titanic training data?

Project: (due in Canvas by Sunday March 10, 2024 or end of finals week.)

Run all of the R code in the A predictive modeling case study from the tidymodels Welcome website in a self-contained R Quarto Notebook.

  1. Build the PENALIZED LOGISTIC REGRESSION model the hotel data. In this case study, explain how the recipe and workflow functions are used to prepare the data for the model. Also, explain how the tune_grid is used.
  2. Build the TREE-BASED ENSEMBLE model the hotel data.
  3. Compare the ROC Curve for the two models and explain which model is better for classifying a hotel booking as with children or no children.

Homework04: (complete by Monday February 26, 2024)

Using the provided Quarto Project, rename the file lastname_firstname_Stat652_Homework04.qmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Canvas.

Upload two files to Canvas. Your .pdf or .doc and your .qmd files. Do not submit a .zip file


Midterm: (due in Canvas by Friday February 23, 2024)

The Midterm is about determining which classification algorithm is best for classifying passengers on the titanic for survival.

Unzip the R Project. See the Midterm.html.

For the Midterm the process of developing a model using the training data is described. Final predictions will be made with the testing data that does not include the labels. This is how kaggle submissions are made.

The old tidymodels code is provided. This code should be updated to the new tidymodels workflow.


Homework03: (complete by Monday February 19, 2024)

Using the provided Quarto Project, rename the file lastname_firstname_Stat652_Homework02.qmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Canvas.

Upload two files to Canvas. Your self-contained: true .html and your .qmd files. DO NOT submit a .zip file


Quiz: (due in Canvas by Friday February 9, 2024)

Instruction: For problem 1 you can complete the questions in an Excel Spreadsheet or in and R Quarto Notebook. Submit either a .xlsx file or both a .qmd and .html file. For problem 2 run the provided R Quarto Notebook answering the questions asked. Submit both a .qmd and .html files.

Use the following Quarto Notebook to answer the questions in the quiz. lastname_firstname_Stat652_Quiz01.qmd

A nice blog post from yuza-Blog to read is glmulti best model and the YouTube video glmulti. The video is a good introduction to the another way to do model selection.

  1. Complete 2.4 Exercises Problem 7 a, b, c from the ISL.

Do parts a, b, and c without normalization or scaling. Re-do parts a, b, and c using either normalization or scaling. Do the results differ?

  1. Run the R code using the best subset regression code the olsrr, from the rsquaredacademy, and leaps packages. This question demonstrates the use of automating the model selection process by fitting all possible regressions and picking the best model using a criteria/metric such as Adrjusted R-squared or AIC.

Homework02b: (complete by Monday February 12, 2024)

Using the provided Quarto Project, rename the file lastname_firstname_Stat652_Homework02.qmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Canvas.

Upload two files to Canvas. Your self-contained: true .html and your .qmd files. DO NOT submit a .zip file

Hints: For Problems 6b, explore the dataset before attempting to fit the models. You will need to deal with the missing values before applying some or all of the models. Which models do not work with missing data?


Homework02a: (complete by Monday February 5, 2024)

Using the provided Quarto Project, rename the file lastname_firstname_Stat652_Homework02.qmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Canvas.

Upload two files to Canvas. Your self-contained: true .html and your .qmd files. DO NOT submit a .zip file

Hints: The HELPrct data from the mosaicData R package. Note that this problem does not ask you to use a training and testing dataset. It is asking you to proceed without the testing dataset and you should use the full dataset to fit the model.

Hints: For Problems 6a, explore the dataset before attempting to fit the models. You will need to deal with the missing values before applying some or all of the models. Which models do not work with missing data?


Homework01: (complete by Monday January 29, 2024)

Using the provided Quarto Project, rename the file lastname_firstname_Stat652_Homework01.qmd using your own last name and first name in the filename.

You should plan to come to class on Monday next week to ask questions and you will have until Friday to turn in this homework through Canvas.

Upload two files to Canvas. Your self-contained: true .html and your .qmd files. DO NOT submit a .zip file