Stat 450 Advanced R for Data Science
Department of Statistics and Biostatistics
California State University, East Bay
Fall 2019
Course Description | Homework | Important Dates | Software |
Syllabus | Handouts | Links | |
Blackboard | podcasts | Data | Online Books |
Finals Week: Week of Dec. 9
I will be holding office hours on Monday as usual. I will be avilable in my office from 2-3pm. I will not be holding my office hours on Wednesday. In place of this office hour I will be avilable on Monday from 7-8pm, before the final exam. On Wednesday I will be avilable by email.
Next Semester:
Now that you are familar for R and RStudio the next step is Stat. 452 Statistical Learning. In Stat. 452 you will learn about modern ML learning algorithms for Classification, Prediction, and Cluster.
Week 15:
- Statistics Department Scholarship: scholarship
- Please complete your class evaluation online. Thank you.
- Resumes: Feel free to turn in your resume in class and I will take a look at it before the end of the Final.
- Spotlight Blog: 50% match is enough
- Today we will start with questions about the Project.
- Today we will start by going over the last Quiz. Sorry I had the solution to the Paractice Quiz posted as the solution to the Quiz. The solution to Quiz 2 is now posted in Blackboard.
- RNotebook:
- Final: The Final will be next week on Monday. The Final will cover topics form Chapters 8 through 16.
Stat. 450 Final
The final will be on Monday December 9 at 8:00pm.
The Final will cover topics from Chapter 8 through 16.
While the Final will focus on the Chapters since the Midterm, you should review Chapter 1 through 7. You will need to know how to tidy your data and compute using your data. And you will need to know how to make plots using ggplot. Review Chapters 3.
- Tibbles
- Importing data
- Tidy data
- Relational data
- Strings
- Factors
- Dates and times
- Review the chapters we have covered. Most importantly Chapters 12 and 13.
- Review the homework solutions.
- Review the quizzes.
- Study the R Studio Cheatsheets. ggplot dplyr
- Come prepared to work in an R Notebook and knit to a Word .docx file. Put your class information on your R Notebook, along with your name and date.
Question 1: Wrangle your data into a form where you can summarize the data and plot the data. Merge data. Know how to convert date-time data.
Question 2: Be able to make bar graphs from raw data and from summarized data. Be able to make side-by-side boxplots.
Question 3: Be able to tidy a data set.
Question 4: Be able to work with strings and factors. Review
Question 5: Be able to perform a two-sample t-test.
Week 14:
- Missed Homework: For the holiday I will be accepting any and all missing or late homework for reconsideration. You can email me with your resubmissions before the end of the day, Dec. 1, 2019. Subject line should be RE: Stat 450 late homework
- Project: The Project due date is the end of the last week of classes. Some hints have been posted.
- Two datesets worth considering for the Project, if you have not found a dataset yet. 1. 2. General Social Survey
- Homework: Homework 12 has been posted.
- Solution: The solution to Quiz 2 from last Wednesday had been posted on Blackboard. I am still grading the Quiz, it should be done by Wednesday this week.
- Holiday: There is no class Monday November 25 - 29, the University is closed for the Thanksgiving holiday. Class we resume on Monday December 2.
- RNotebook:
- Spotlight book: Quick-R
- Spotlight book: ModernDive
- RNotebook:
- RNotebook:
Week 13:
- Quiz: We will start class on Wednesday with Quiz 2.
- Solution: This is the solution to the practice quiz from last Wednesday.
- Presentation:
- RNotebook:
Week 12:
- Holiday: Next Monday is a University Holiday, there will be no class on Monday.
- Quiz 2: Next week on Wednesday we will have Quiz 2 in class.
- Practice Quiz 2: Wednesday The topics covered will be since the midterm, Chapters 9 - 15.
- Homework Solutions: All of the homework solutions have been posted.
- RNotebook:
- Solution: This is the solution to the practice for the practice quiz.
Week 11:
- Spotlight book: Yarrr! This is a nice online book to learn how to do basic statistics with R.
- Spotlight book: ModernDive
- Conference: 2019 Electronic Undergraduate Statistics Research Conference
- RNotebook:
- This week we will discuss basic Regular Expressions using the str_view() and str_detect(). . ^ $ [ ] |
- r4ds Ch. 14 We are going to work with words and sentences.
- Presentation:
Week 10:
- Cheatsheet:
- Text Mining is valuablable
- Presentation:
- RNotebook: (This solution for Homework 6.)
- Presentation:
- We will review some examples of spread() and gather() in the book. And take a look at the new functions in the newly released tidyr package, version 1.0, pivot_taller() and pivot_wider().
- RNotebook: spread() and gather()
- RNotebook: pivot_wider() and pivot_longer()
Week 9:
- Midterm: We will go over the Midterm solutions on Monday. The Midterm solution has been posted in Blackboard.
- Homework: Homework 7 and 8 have been posted.
- This week we will continue our discussion of Tidy Data. spread and gather. unite and separate.
- Presentation:
- RNotebook:
Week 8:
- I am planning to return the Midterm on Wednesday in class or next week on Monday.
- Homework: Homework 7 has been posted.
- This week well start Section II of the book, Wrangle.
- Presentation:
- RNotebook:
- Presentation:
- RNotebook: spread() and gather()
Week 7:
- Homework: Homework 6 has been posted.
- Quiz 1: The solution to Quiz 1 is posted in Blackboard.
- Monday we will go over Quiz 1 and review for the Midterm.
- Midterm: The midterm will be this week on Wednesday in class. The Midterm will cover topics form Chapters 1 through 7. Topics to review:
- ggplot: all of the geoms, facet_wrap, violin, lvplot, beeswarm (?)
- dplyr: filter, select, mutate, arrange, summarize, group_by
- datasets: mpg, nycflights, diamonds
- Review the chapters we have covered. Most importantly Chapters 3, 5, and 7
- Review the homework solutions.
- Study the R Studio Cheatsheets. ggplot dplyr
- Come prepared to work in an R Notebook and knit to a Word .docx file or .pdf file.
- RNotebook: (This solution is for the rest of the problems in Homework 4.)
Week 6:
- Quiz 1: This week Wednesday September 25, 2019
- Midterm: Next week Wednesday October 2, 2019 in class.
- RNotebook:
- RNotebook: (This solution is for the rest of the problems in Homework 3.)
- Practice Quiz 1 Solution:
- Today we will look at an example of JSON formated data and Exploratory Data Analysis. See the analytics.bart.gov website. This is a good example of dynamically updated data. Download the JSON files and read them into Excel. You will need to be on Windows and have the newest Excel. Figure out how to replicate the device visualizations using ggplot().
- RNotebook:
Week 5:
- Homework: Homework 5 has been posted.
- RNotebook:
- Quiz 1: Next week Wednesday September 25, 2019
- Midterm: Wednesday October 2, 2019 in class.
- Presentation:
- RNotebook:
- Practice Quiz 1: Wednesday September 18, 2019
Week 4:
- Homework: Homework 4 has been posted.
- Monday we will discuss more about Chapter 5.
- RNotebook:
- RNotebook: with the use of pipes
- Presentation:
- RNotebook:
Week 3:
- Microsoft Office for Students
- Homework Solution: The solution to Homework 1 has been posted in Blackboard. Please review the solution and the .Rmd file. Your Homework should have a cover page, then each problem should start at the top of the new page in your document.
- Homework: Homework 3 has been posted.
How is Homework 2 going?
- Presentation:
- RNotebook:
- R Cheatsheets:
- Presentation:
Week 2:
- Today: Today, at the beginning of class, we will be uninstalling the older version of R and RStudio and installing the new versions.
- Step 1: Uninstall the old version of RStudio.
- Step 2: Uninstall the old version of R.
- Step 3: Download the new version of R for Windows from r-project.org CRAN and install.
- Step 4: Download the new version of RStudio for Windows from RStudio Desktop Download and install.
- Step 5: Find RStudio in the Start Menu and start RStudio, check to see that it works and install the tidyverse package.
- Please sign up for an RStudio Cloud before Wednesday's class we may need to use it as the software in the computer lab update is being completed.
- The book for the class: r4ds
- The solutions for the book: r4ds-solutions Note that the numbering of the problems has changed and some of the problems have been changed, so read the solutions carefully before assuming they are correct. I have posted the link because I think it is fair that everyone has the link, not just the people who have found it. For best learning, you should consult this when you are done with your homework to check your answers.
- Homework: Homework 2 has been posted.
- Presentation:
- RNotebook:
- RNotebook: Suggestions for completing your homework for the class. It should be easy to read.
- R Cheetsheets:
Week 1:
- The book for the class: r4ds
- Or you can read the book through the CSUEB library > Data Batabases > Safari Books Online
- Presentation:
- Homework: Homework 1 has been posted.
- Spotlight Software:
- Spotlight Blog post: Prime Hints For Running A Data Project In R
- Spotlight Blog post: Modern Data Science
- Spotlight Book: What you need to know about R
Learning R:
Excellent References:
Data Science: