Statistics 452: Homework


Project Part II:

(due Friday May 15, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_project.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_project.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Project: Give a title"

author: "Your name"

date: "TBA"


Homework 10:

Not collected.

The header of your R Notebooks should include

title: "Stat. 452 Homework 10"

author: "Your name"

date: "May 11, 2020"

Upload one file to Blackboard.

Produce one .docx or .pdf file. Try to do your work using a Project in R.

  1. Perform the Cluster analysis on the sns data. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.

Homework 9:

Not collected.

The header of your R Notebooks should include

title: "Stat. 452 Homework 9"

author: "Your name"

date: "May 4, 2020"

Upload one file to Blackboard.

Produce one .docx or .pdf file. Try to do your work using a Project in R.

  1. Perform the Association analysis on the groceries analysis letter data. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.

Homework 8:

(due Monday April 27, 2020)

The header of your R Notebooks should include

title: "Stat. 452 Homework 8"

author: "Your name"

date: "April 27, 2020"

Upload one file to Blackboard.

Produce one .docx or .pdf file. Try to do your work using a Project in R.

  1. Perform the ANN analysis on the concrete data. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.
  2. Develop an ANN for the redwines.csv data from Homework 5.
    • Organize you report using the Five Steps.
  3. Read the blog post Multilable classification with neuralnet package and run the code.
  4. Optional: Read Chapter 1 and 2 of Deep Learning with R and try to run the Rnotebook for Chapter 2. See the Source Code download link.
  5. Optional: If you want to get started with Tensorflow.

Homework 7:

(due Monday April 20, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw4.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw4.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 7"

author: "Your name"

date: "April 20, 2020"

Upload one file to Blackboard.

Produce one .docx or .pdf file. Try to do your work using a Project in R.

  1. Perform the Logistic Regression analysis of the credit data. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.
  2. Perform the Random Forest analysis of the credit data. Produce a report using an Rnotebook explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.

Project Part I:

(due Friday April 24, 2020)


Homework 6:

(due Monday April 13 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw4.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw4.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 6"

author: "Your name"

date: "April 13, 2020"

Upload one file to Blackboard.

Do the following:

  1. Perform the Regression Tree based analysis of the redwine data. Produce a report using an Rnotebook explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.

Homework 5:

(due Monday April 6, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw4.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw4.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 5"

author: "Your name"

date: "April 6, 2020"

Upload one file to Blackboard.

Do the following:

  1. (If you can get rWeka to work with Java 64 bit, that would be good. Otherwise, try this using the RStudio Cloud.) Perform the Rule based analysis of the mushroom. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.
    • Be sure to include:
    • Show the prediction that the algorithm produced.
    • Give the Accuracy of the predictions.
    • Include the confusion matrix.
  2. Perform the Linear Regression analysis of the insurance data. Produce a report using an Rnotebook explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.

Homework 4:

(due Monday March 23, 2019)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw4.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw4.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 4"

author: "Your name"

date: "March 23, 2019"

Upload one file to Blackboard.

Do the following:

  1. Perform the Tree based analysis of the credit data. Produce a report explaining the data, the analysis, and the findings.
    • Be sure to include:
    • Show the prediction that the algorithm produced.
    • Give the Accuracy of the predictions.
    • Include the confusion matrix.

Homework 3:

(due Monday February 24, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw3.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw3.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 3"

author: "Your name"

date: "Feburary 17, 2020"

Upload one file to Blackboard.

Student Question: Is the denominator of Bayes Rule on page 97, of the First Edition of the book correct? Answer No. The multiplication rule for independent events does not hold. The independence that is assumed in Naive Bayes is class-conditional independence. This means the words are independent given the class is spam or ham, not unconditionally.

Student Question: How do I randomize a data set? The author gives only examples of data sets that have already been randomized. Answer: See Step 2 of the Example given in Chapter 5. 132-133 / 139-140. Read Chapter 10, Section on The holdout method.

Do the following:

  1. Perform the SMS spam filtering analysis. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.
    • Be sure to include:
    • Show the prediction that the algorithm produced.
    • Give the Accuracy of the predictions.
    • Include the confusion matrix.
  2. Find an interesting dataset that is appropriate for applying the naive Bayes algorithm, load the data into R, and proceed to classify the data using naive Bayes. (You can find an example dataset anywhere you want. One suggestion is to try the first example from the e1071 package naiveBayes function, see Rdocumentation. This example uses the HouseVotes84 data from the mlbench package, see RDocumentation. I do think everyone should try this example. Print out the dataset and see that it is full of Y and N. Also, note the NAs.)
  3. Find a Google Sheets Add-ons app that can perform Sentiment Analysis. See if you can figure out what the algorithm that is used. (This problem is to look in the Google Sheets Add-ons and not in the Google Chrome AppStore.)

Hint: For Problem 2 you will need to take a random sample of the original data to make the training dataset and then use the remaining data to make the testing dataset. Use the following code. Replace launch with the name of your dataset.

> indx <- sample(1:nrow(launch), as.integer(0.9*nrow(launch)))
> indx

> launch_train <- launch[indx,]
> launch_test <- launch[-indx,]

Hint: For Problem 2 you need to find a dataset, I would suggest looking on the UCI ML Repository.
You will need to add the names yourself or find ne that has the names included.


Homework 2:

(due Monday February 10, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw2.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw2.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 2"

author: "Your name"

date: "Feburary 10, 2020"

Upload one file to Blackboard.

Do the following:

  1. Perform the cancer diagnosis kNN analysis. Produce a report explaining the data, the analysis, and the findings.
    • Organize you report using the Five Steps.
    • Be sure to include:
      1. Show the prediction that the algorithm produced.
      2. Give the Accuracy of the predictions. See Page 318 (or 299).
      3. Include the confusion matrix.
  2. Find an interesting dataset from the UCI ML Repository that is appropriate for applying the kNN algorithm and load the data into R and proceed to classify the data using kNN.
  3. Do problem 7a,b,c, see page 54, in An Introduction to Statistical Learning.

Hint: For Problem 2 you will need to take a random sample of the original data to make the training dataset and then use the remaining data to make the testing dataset. Use the following code. Replace launch with the name of your dataset.

> indx <- sample(1:nrow(launch), as.integer(0.9*nrow(launch)))
> indx

> launch_train <- launch[indx,]
> launch_test <- launch[-indx,]

Hint: For Problem 2 you need to find a dataset, I would suggest looking on the UCI ML Repository.
You will need to add the names yourself or find ne that has the names included.


Homework 1:

(due Monday February 3, 2020)

Using an R Notebook produce your solutions to the following questions. Start by making an R Notebook with file name Lastname_Firstname_Stat452_hw1.Rmd. Then knit the .Rmd file to either Lastname_Firstname_Stat452_hw1.docx. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your R Notebooks should include

title: "Stat. 452 Homework 1"

author: "Your name"

date: "Feburary 3, 2020"

Upload one file to Blackboard.

Do the following:

  1. Do a google search on the following terms and develop a working definition of each.
    • Statistical Learning
    • Statistical Machine Learning
    • Machine Learning
    • Predicitive Analytics
    • Artifical Intelligence
    • Deep Learning
  2. Run all of the code from Chapter 2 to become familiar with R. (If you have experience with R, this will get you familiar with the code from the author.) Show some of the relevant output from R and discuss what you have learned from the data.
  3. Download this book and become familiar with the materials on the websites.