Stat 650 Advanced R for Data Science

Department of Statistics and Biostatistics

California State University, East Bay

Fall 2020

Course Description Homework Important Dates Software
Syllabus Handouts Jamboard Links
Blackboard podcasts Data Online Books

Week 8: Finals week


Week 7:


Week 6:


Midterm Information:

I have received a number of emails about not being able to knit the midterm when using the fordgobike01.Rmd separately to create the dataframes and then using a separate .Rmd file to access dataframes in the Global Environment for answering the Midterm questions. It turns out that RStudio does not allow this through the interface Preview/Knit button, this is considered non-reproducible. Here is a link to a StackOverFlow post about this topic. To fix this problem I would suggest you move the code you have used to download the data to an R Script and the source() the R Script at the start of your R Notebook to load the data. Save the data as an .Rds file and load the .Rds file. Use the readr R package functions write_rds() and read_rds().

Suggestions:

  1. Make sure you name your file correctly. Lastname and Firstname are the names you have in the Unversity computer systems.
  2. If you are are having problem visualizing the missing data, try running your code on subsets of your data. I would suggest chunks of approximately 25,000 observations.
  3. DO NOT submit your data to Blackboard.
  4. DO NOT submit a .zip file. At this point I would suggest submitting a .Rmd file containing the solutions to the questions, a .R Scipt containing your data wrangling steps, and a .docx or .pdf file that I will read.

Week 5:


Week 4:


Week 3:


Week 2:


Week 1:


References:

Learning R:

Learning Python:

Learning SQL:

Other classes. What is the difference between Statistical Learning, Machine Learning, Data Science, Data Mining, KDD, etc.?

Excellent References:

Data Science:

Reading related to the Digital Economy:

More Big Picture: