Stat 694: Applied Research in Statistics and Biostatistics
Department of Statistics and Biostatistics, CSU East Bay
Fall 2022:
Week 16:
- Class today, Friday December 2, 2022 will be online from noon to 2pm.
- Class Presentations will be online today. To do the presentation you will be made a co-host and you can share slides or code.
- If you would rather present in person, please let me know and we can do that next Friday December 16 from the classroom.
Week 15:
- Class tomorrow, Friday November 4, 2022 will be online from noon to 2pm.
- Class Presentations: After the Thanksgiving week we will start to have project presentations. Your presentation can be a PowerPoint presentation, it would be a well organized demo of your code and notebook, it could be a discussion of the findings of your data analysis. I leave the kind of presentation up to you. The best presentations are the ones where people have a small number of memoriable take-away ideas. Final thing is to say a few words about further work that you plan on pursuing.
- Presentation: kNN
- Presentation: k-NN Disgnosing Breast Cancer
- Presentation: Clustering
Excellent References:
Machine Learning with R:
- Machine Learning with R Read online from the University Library Databases A-Z > O > O-Reilly Online Learning E-Books
- Introduction to Statistical Learning
- Elements of Statistical Learning
- Computer Age Statistical Inference
- Hands-On Machine Learning with R
- tidymodels
- mlr3 book
- Introduction to Machine Learning (I2ML)
- Interpretable Machine Learning
Machine Learning with Python:
- w3schools Machine Learning in Python
- Real Python Machine Learning Tutorials
- Introduction to Machine Learning with Python
- scikit-learn
- pycaret
- pycaret book
- Python Machine Learning Read online from the University Library Databases A-Z > O > O-Reilly Online Learning E-Books
Big Picture:
- Fourth Paradigm of Science: Data-Intensive Scientific Discovery
- McKinsey Global Institute Big Data: The next frontier for innovation, competition, and productivity
- Data Scientist: The Sexiest Job of the 21st Century
- Is Data Scientist Still the Sexiest Job of the 21st Centry?
- Data Driven Data Jujitsu Building Data Science Teams Ethics and Data Science
Week 14:
- No class. Holiday.
Week 13:
- Class tomorrow, Friday November 4, 2022 will be online from noon to 2pm.
- Topics: Continue the discussion from last week. Bootstrap.
- Spotlight book: Modern Dive See Chapter 7 for a discussion about Sampling and Chapter 8 for an introduction to Bootstrapping.
- Spotlight book: islr
Week 12:
- Class tomorrow, Friday October 28, 2022 will be online from noon to 2pm.
- Topics: This week we will begin discussing Machine Learning. We will discuss Training/Testing Datasets, Classification and Prediction algorithms, and Accuracy. We will introduce kNN and Clustering.
- Statistical Machine Learning:
- Presentation: Welcome
Week 11:
- Class tomorrow, Friday October 21, 2022 will be online from noon to 2pm.
- Progress Reports.
Week 10:
- Class tomorrow, Friday October 21, 2022 will be online from noon to 2pm.
- Today: We will begin with progress reports. We will also start to take a look at how to use the API from the DataSF website. We will take a look at the Parking Meters data.
- Quarto Document
Week 9:
- Class tomorrow, Friday October 14, 2022 will be online from noon to 2pm.
- Today: Today we will continue to look at the BART Analytics website bart.gov and we will build a dashboard similar to the one online using the json files. We will make all of the plots using ggplot2 and then use the R packages patchwork and plotly to build a single plot with all of the subplots.
- Quatro Document
Week 8:
- Class tomorrow, Friday October 7, 2022 will be online from noon to 2pm.
- Spotlight Youtube:
- Spotlight Podcasts:
- Spotlight blogs:
- Spotlight books:
- Data Visualization using ggplot
- Today: Today we will start by testing a few ggplot2 plots. Then we will take a look at the BART Analytics website to see how data is commonly stored in json files on the internet. Finally we will discuss what an API is and test out an example of using an API.
- Today we will look at an example of JSON formatted data. See the analytics bart.gov website. This is a good example of dynamically updated data. Download the JSON files and read them into Excel. You will need to be on Windows and have the newest Excel.
- RNotebook:
- BART.Rmd
- BART.nb.html
- BART.docx
- BART.pdf
- BART.zip updated
- Last thing today is to take a look at the Parking Meter data from DataSF. For next time get an account.
Week 7:
- Next Week: Visualization using ggplot2.
- Spotlight Website: TED The best Hans Rosling talks you’ve ever seen
- Spotlight Website: Hans Rosling’s 200 Countries, 200 Years, 4 Minutes - The Joy of Stats - BBC Four
- Today: we will experiment with the ggplot2 commands for Data Visualization.
- Books:
- Stat. 651 Presentation:
- RNotebook:
- RNotebook:
- RNotebook:
- RNotebook: Maps, storms on a map
Week 6:
- Class today, Friday September 22, 2022 will be online from noon to 2pm.
- Today: Project Stand-up. Today everyone get a summary of their project, progress, and received suggestions for next steps.
- The main suggestion was to focus on getting started on a small reasonable idea that is possible to complete in the time allowed.
- A main observation is that most data that is publicly available is not raw data (measurements on individuals) but is summarized counts and means. So it is very difficult to produce prediction or forecasts when the raw data is not available.
Week 5:
- Class today, Friday September 16, 2022 will be online from noon to 2pm.
- Today: we will experiment with the dplyr commands for Data Wrangling. We will discuss the pipe %>% or CTRL+ALT M.
- RNotebook: Updated
- Homework05 has been posted.
- Gantt Charts ganttrify
Week 4:
- Class today, Friday September 8, 2022, will be in class and online from noon to 2pm.
- Today: Begin to look at Data Wrangling with the Tidyverse.
- Presentation:
- RNotebook:
- RNotebook: with answers to the questions
- Update: There is a new R package called starwarsdb. Update the above presentation and RMarkdown documents.
- r4ds
- Homework04 has been posted.
Week 3:
- Class today, Friday September 2, 2022, will be online from noon to 2pm.
- Today: We will discuss the use of Git and GitHub. Please create a GitHub account if you do not already have one.
- We will create a first GitHub Repository.
- We will create an RProject with git.
- Next week will create an RProject with version control on GitHub.
- Will generate RStudio credentials and give access to GitHub. In Rstudio Tools > Global Options > Git/SVN > Create SSH Key… > Create > Copy Key. In GitHub Setting (pull down upper right) > Settings > SSH & GPT Keys > New SSH key > Paste your key > Add SSHN key.
- Finally we will download some files into our version controlled RProject and Commit changes and Push update to GitHub. Download the ob_prob7_suess files into your RProject, commit them and push them to your GitHub Repository.
- Happy Git and GitHub for R users
- Today: We will install and try the DataExplorer R package for AutoEDA.
Week 2:
Sorry, Sorry, Sorry: It turns out that there are two versions of this course in Canvas. I have been posting things in the wrong Canvas class. That is why my announcements and homework assignments have not been visible. I am working on moving everything over to the correct Canvas class.
- See the Assignment link for the Homework assignments.
- Start today with by logging into Safari books from the library website. A-Z Databases Note there are books from Apress, Packt, Manning, O’Reilly, CRC, etc.
- How does using the O’Reilly Online learning platform differ from using other similar platforms, such as packt or LinkedIn Learning.
- Demo the differences between R scripts and R Projects (always use R Projects). Introduce Quarto.
- Install the following R packages that contain datasets: palmerpenguines, fueleconomy, nycflights13
- Install the following R package for AutoEDA: DataExplorer
- Download the following files onto your computer and create an R Project containing these files.
Week 1:
- Welcome to the Data Science Workgroup. book
- We will be meeting on Friday’s at noon.
- Discuss topics of discussion for the semester.
- Guest speakers?
- What will be your research project for the semester? A first draft will be due next week.
- I have begun working on the COVID19 Data Hub. I am looking for people who can help monitor the Github comments and to test the software on a regular basis.
Week 0:
- Welcome to Statistics 694.