Assignments


Final: (due end of Finals week, Friday December 15)

Instructions: This is a take-home final. Your work is to be completed individually. You may use your book, Google, and questions can be asked of the instructor through you are not to share computer code with others in the class.

Submit your .qmd file and either a .html or .pdf file in Canvas.

  1. Visit the Data Is Beautiful YouTube channel and watch a few of the videos. (Or find another similar channel, such as Data is Public.) These kinds of visualizations are quite popular these days. a) Describe in detail the different kinds visualizations that are present in the videos. b) Comment on what is wrong with the Most Popular Music Styles 1910 - 2019 video.

  2. Read over Chapter 10 of the r4ds2e book. Do 10.5.1.1 Exercises 4 using the lv_plot and also using the violin plot. Compare your two new plots to using a boxplot.

  3. Make your own Self Evaluation checklist. Review the best_practices.html presentation and make a one page check list for Evaluating your data visualizations for use in the future.

  4. Clearly explain how latitude and longitude data can plotted on a scatterplot. Which is on the x_axis and which is on the y-axis.


Project: (due end of Finals week, Friday December 15)

Instructions: Submit your .qmd file and either a .html or .pdf file in Canvas.

Make two statebins using data from FRED US Regional Data. Use the socviz Chapter 7 Section 3 as a reference.

Complete your work in an Quarto Notebook and turn in both a .qmd and an .html or .pdf file using the usual filename. lastname_firstname_Stat651_Project.qmd and lastname_firstname_Stat651_Project.html.

Hint: The code from socviz is now out of date. The statebins R package has been updated to work with the tidyverse. See help(“statebins”).


Midterm: (due in Canvas Friday December 8, 2023)

Instructions: This is a take-home midterm. Your work is to be completed individually. You may use your book, Google, and questions can be asked of the instructor through you are not to share computer code with others in the class.

Instructions: In one folder Lastname_Firstname_Stat651_Midterm create separate R Projects for each part of the Midterm. Each R Project should have a directory name such as Problem_01_heatmap, Problem_02_example, Problem_02_replaced_data, and Problem_03_flexdashboard. This suggestion is to help you organize your work into separate R Project directories.

Submit an .html or .pdf file for Problem 1 and 2 with links to shinyapp.io or if you make heatmaps in a Notebook submit that.

  1. Find an example of a Shiny App that uses a heatmap. Download the code and get it to work locally on your computer. Upload the working app to shinyapp.io and provide the link.

  2. Update:

Since the gtrendsR package is not working currently due to a change from Google Trends.

Make a time plot in R of the Google Trends Index for two topics of your choice.

Note: The gtrends R package is not working currently. You will have to download the new data directly from Google Trends.

Download the Shiny App that plots the Google Trend Index for five topics. Here is the github link Google Trend Index App. Get the code to work. Change the topics to other related topics of interest to you. Provide the links to your two shinyapps.io apps in your .html or .pdf file.

Hint: Try out Google Trends US if you are not familiar with it.

Hint: To get the Google Trends Index App to work you should start an R Project by clicking on Projects > New Project on the RIGHT and select single file App. You will need to go to the authors GitHub and download the two data files. Save the two files into a subdirectory /data in the R Project directory.

Here are the direct links.

Currently NOT working: Now that you have the App running you need to figure out how to replace the data in the two files with different data. You can change the trend_description.csv directly. I would suggest listing two topics. Then replace the data. There is an R package gtrendsR. Install it and try some code like what follows. Run the code in your R Project so the trend_data.csv file is replaced in the subdirectory /data. Here is what it should look like Google-Trends-Shiny-App

To get get the gtrendsR package to work correctly I needed to install the development version of the package.

 > if (!require("devtools")) install.packages("devtools")
 > devtools::install_github("PMassicotte/gtrendsR")

 > library(tidyverse)
 > library(gtrendsR)
 > data("categories")
 > head(categories)
 > res <- gtrends(c("golf","football"), geo = c("US"), time = "all")
 > plot(res)
 > res <- res$interest_over_time
 > head(res)
 > res %>% select(keyword, date, hits) %>%
 > rename(type = keyword, close = hits) %>%
 > write_csv("data/trend_data.csv")
  1. Extra Credit: Make a flexdashboard for the most recent month of the Lift BayWheels data.

To get the updated Lyft BayWheels, here is the link to the data use the same code you used in Stat. 650. Here is the link to the website with the code we used. The East Bay R Language Beginners Group

To start use the data from the most recent month available.

You should use the code in the fordgobike01.Rmd, as we used before, to download the updated data. Or your own code.

(Optional, not recommended) If you are interested in directly connecting to the gbfs API, try out the code infordgobike02.Rmd, which pull information about the stations, not individual rides. (That is my understanding so far.)

I would like you to make a dashboard using the flexdashboard R package. To do this you can use the code you developed before, make some visualizations similar to the ones I have included, and put them into a dashaboard.

Including visualizations of monthly users and yearly users would be good.

Explain: Write a statement about what your dashboard is trying to communicate.


Quiz01: (due November 18, 2023) A video is now available in Canvas > Zoom > Cloud Recordings with how to make the visualizations in Tableau.


Homework 4: (due Monday December 8, 2023)

Using an Quarto Document produce your solutions to the following questions. Start by making an Quarto Document with file name Lastname_Firstname_Stat651_hw3.qmd. Then Render the .qmd file to either Lastname_Firstname_Stat651_hw3.docx or a .pdf file. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your Quatro Documents should include

title: "Stat. 651 Homework 4"

author: "Your name"

date: "November 28, 2022"

Upload your .docx or .pdf file and your .qmd file to Canvas.

Read: Chapter 20

Problems:


Homework 3: (due Monday November 27, 2023) In Canvas, due Dec. 8, 2023.

Using an Quarto Document produce your solutions to the following questions. Start by making an Quarto Document with file name Lastname_Firstname_Stat651_hw1.qmd. Then Render the .qmd file to either Lastname_Firstname_Stat651_hw1.html or a .pdf file. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your Quatro Documents should include

title: "Stat. 651 Homework 3"

author: "Your name"

date: " November 27, 2023"

Upload your .html or .pdf file and your .qmd file to Canvas.

Read: Chapter 17, Chapter 18

Problems:

Hint: For Chapter 14 Exercise Problem 6. Read this blog post. Got a Scatter Plot? Learn How to Add Marginal Histograms See Week 5 for the data file HELPrct.csv.

Hint: For Chapter 18 Exercises Problem 2 Here is the link to some maps MacLeish Field Station maps.

Hint: For Chapter 18 Exercises Problem 1

Start by geocoding the distinct (or unique) addesses and zipcodes. Use the tidygeocoder R package. Think about how many popups might be reasonable to display before geocoding all of the address. Maybe start with 10 and increase to a reasonable number. It takes some time to download the addresses when geocoding. Then remove all na’s using the drop_na() function. So we will remove the restaurants that are missing the dba variable or the address. Take a sample of 50 and make the plot. Plot points only in NYC, so filter out anything that has been geocoded incorrectly. See Week 5 for the data file Violations_loc.csv. This file contains my geocoded location data.

And you might want to use the glue R package.


Homework02: (due Monday November 6, 2023)

Using an Quarto Document produce your solutions to the following questions. Start by making an Quarto Document with file name Lastname_Firstname_Stat651_hw1.qmd. Then Render the .qmd file to either Lastname_Firstname_Stat651_hw1.html or a .pdf file. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your Quatro Documents should include

title: "Stat. 651 Homework 2"

author: "Your name"

date: " November 6, 2023"

Upload your .html or .pdf file and your .qmd file to Canvas.

Problems:

Hint: For Problem 8 The basic part of a Shiny app

Hint:

> mergedViolations %>% select(dba, boro, cuisine_description) %>% 
>   group_by(cuisine_description) %>% 
>   summarize(n = n_distinct(dba)) %>% 
>   filter(cuisine_description == "Pizza")

> mergedViolations %>% select(dba, boro, cuisine_description) %>% 
>   filter(boro == "BROOKLYN") %>% 
>   group_by(cuisine_description) %>% 
>   summarize(n = n_distinct(dba)) %>% 
>   filter(cuisine_description == "Caribbean")

Homework01: (due Monday October 23, 2023)

Using an Quarto Document produce your solutions to the following questions. Start by making an Quarto Document with file name Lastname_Firstname_Stat651_hw1.qmd. Then Render the .qmd file to either Lastname_Firstname_Stat651_hw1.html or a .pdf file. Use your own last name and first name in the filename. At the top of your first page you should include Name, Class, Section, and homework assignment.

The header of your Quatro Documents should include

title: "Stat. 651 Homework 1"

author: "Your name"

date: " October 23, 2023"

Upload your .html or .pdf file and your .qmd file to Canvas.

Problems: