Statistics 694: Homework

Homework 5:

(due Friday December 4, 2020)

Homework 4:

(due Friday November 6, 2020)

Homework 3:

(due Wednesday November 4, 2020)

Provide a summary of your class project in an R Notebook. Include a direct reference to the data you are using. If you have used R code to analyze your data show the code and the results in your R Notebook. If you have specific questions, please include them at the top of your summary.

Homework 2:

Homework 1:

(due Wednesday September 4, 2020)

Please disregard everything that is below.

Homework 4:


Complete the following problems: Upload your individual files in Blackboard. Please do not zip your files together before submitting, I cannot see you work directly if it is zipped.

  1. In your GitHub account create a Repository. Configure R Studio to be used with git and GitHub. Create an .Rmd test file and commit and push the file. Give the link to your GitHub Repository. Submit the link to your file on your GitHub.
  2. Run the code, was there a spike in crime during the time of the protests a few weeks ago? Turn in a .Rmd and .docx file.
  3. Install the gutenberger R package and download a book from the Gutenberg Project and view it.
  4. Download the cat-and-dogs data from kaggle. Show that you can run the following .R code. cats-and-dogs-dir-ver02.R Turn in a .Rmd and .docx file.
  5. Create a disk.frame from the Fannie Mae Single-Family Loan Performance data set. Start by using only one year of data. This may be difficult to do.

Homework 3:


Complete the following problems: Upload your individual files in Blackboard. Please do not zip your files together before submitting, I cannot see you work directly if it is zipped.

  1. Install SQLite DB Browser and test it out with the .sqlite databases created using the R code in Nothing to turn in for this problem.
  2. Read Chapter 12 Section 3 Pivoting in r4ds. Do 12.3.3 Exercise 1. Turn in lastname_firstname_Stat694_hw3_prob2.Rmd and lastname_firstname_Stat694_hw3_prob2.pdf or .docx
  3. Read Chapter 13 in r4ds. Do 13.5.1 Exercises 3. Turn in lastname_firstname_Stat694_hw3_prob3.Rmd and lastname_firstname_Stat694_hw3_prob3.pdf or .docx
  4. Download the Chicago Crime data since 2001 as a .csv file. Load it into a data.table and export it to a .sqlite file. Try to access the database from the SQLite DB Browser, Nothing to turn in for this problem.


Submit in Blackboard.

I would like to receive your Idea, Plan & Development propsal for your class Project next week.

The Project should be something that you are interested in working on beyond this class assignment.

One thing that can be very valuable in the interview process is being able to discuss and show your own work, where you have a genuine interest and motiviation. This might be asked about in a questions such as, "Please tell me about any projects or research you have worked on outside of a class."

In the end you could write a blog post, or upload your work to GitHub, or post a Notebook on Kaggle.

For the class Project you will be asked to present an R Notebook explaining what you have worked on.

Some ideas.

Homework 2:


Complete the following problems: Upload your files in Blackboard. Please do not zip your files together before submitting, I cannot see you work directly if it is zipped.

  1. Open a kaggle account and find their Micro Courses. Open a GitHub account. Nothing to turn in for this problem.
  2. Load the mlbench R package. Using the Data Explorer R package run a report of the BostonHousing data using medv as the target variable. Label your report file lastname_firstname_Stat694_hw2_prob1_BostonHousing_report.html convert to a .pdf.
  3. The R Project contains R Tidyverse code to merge all of the many data tables in the nycflights data set into one dataframe/tibble. Run the code to create an overall dataframe. From the code determine what variable(s) has issues with the recorded data values in the variable(s). Turn in lastname_firstname_Stat694_hw2_prob2.Rmd and lastname_firstname_Stat694_hw2_prob2.pdf or .docx
  4. Download the lyft Bay Wheels trip data for 2020. There are 5 months of data available. Using R code download the data files, unzip the data files, read all of the data files into R, and use the Data Explorer R package to summarize all 5 months of data together. If your computer is not able to process all of the data to downsample the data. Turn in lastname_firstname_Stat694_hw2_prob1_BostonHousing_report.html (convert to a .pdf) and lastname_firstname_Stat694_hw2_prob3.pdf or .docx You may use the posted code to develop your .Rmd file. Do not turn in the instructor's .Rmd file. You can use some of the code and add your own comments.
  5. Install the snedata R package from GitHub. Download the MNIST data set and plot a few of the 28 x 28 images. Turn in lastname_firstname_Stat694_hw2_prob4.pdf or .docx
  6. Download and install the instructor's R Package MyMeanSDPackage.tar.gz. Nothing to turn in for this problem.

Homework 1:


Complete the following problems: Upload your 2 files in Blackboard. Please do not zip your files together before submitting, I cannot see you work directly if it is zipped.

  1. Install R and R Studio. Open an R Studio Cloud account. Nothing to turn in for this problem.
  2. Try not to make a mess of your work. Use an R Project or multiple R Projects if you are working with multiple different datasets. Create your project within your class directory in a subdirectory Homework. Nothing to turn in for this problem.
  3. Install the Tidyverse. Use options(Ncpus = 8). Change to the number of cpu cores on your machine. Nothing to turn in for this problem.
  4. Save the mtcars dataset to a .zip file using R code. Unzip the file in the directory you are working in. Turn in lastname_firstname_Stat694_hw1_prob3.pdf or .docx
  5. Familiarize yourself with the Fannie Mae Single-Family Loan Performance Data. Download the Acquisition Data and the Performance Data sample files. Write R code to read in these two sample files. Take a 20% sample of each data set. Turn in lastname_firstname_Stat694_hw1_prob4.pdf or .docx.
  6. Download the Fannie Mae Single-Family Loan Performance data set. Nothing to turn in for this problem.

Everything below this line is from past offering of Stat. 694 and from previous Data Science Workgroup meetings.

Homework 1:

Develop a plan for a Data Science Project to be completed during the class. Your project should be related to a topic of interest to you and should, hopefully, be related to a career opportunity you planning to pursue.

Start a Slack discussion directly with the instructor about your project.

Do not post your ideas in the general or random channels in Slack.

Decribe the following:

  1. Brainstorm if you do not have an idea. Try to come up with an idea.
    Look at some job descriptions.
    Look at some data competitions. Kaggle Look at some sources of data.
  2. Once you have an idea, describe the idea as clearly as possible.
  3. Make a list of steps you plan to do to complete your project. The list might start with identify a source for the data you plan to use. Describe the project of your work.
  4. State whether you plan to produce a written report or an blog post or an App. A written report should be produced using an R Notebook using good reproducible research pacticies. A blog post should be posted on your blog. An App could be shared on

RStudio resources:

Fall 2018 / Spring 2019

Homework 4:

(due Friday Oct. 26, 2018)

Homework 3:

Homework 2:

Homework 1:

Develop a plan for a Data Science Project to be completed during the class. Your project should be related to a topic of interest to you and should, hopefully, be related to a career opportunity you planning to pursue.

Start a slack discussion directly with the instructor about your project.

Do not post your ideas in the general or random channels in slack.

Decribe the following:

  1. Brainstorm if you do not have an idea. Try to come up with an idea.
    Look at some job descriptions.
    Look at some data competitions. Kaggle Look at some sources of data.
  2. Once you have an idea, describe the idea as clearly as possible.
  3. Make a list of steps you plan to do to complete your project. The list might start with identify a source for the data you plan to use. Describe the project of your work.
  4. State whether you plan to produce a written report or an blog post or an App. A written report should be produced using an R Notebook using good reproducible research pacticies. A blog post should be posted on your blog. An App could be shared on

RStudio resources: