--- title: 'Stat. 450 Section 1 or 2: Homework 8' output: word_document: default pdf_document: default html_notebook: default html_document: df_print: paged --- **Prof. Eric A. Suess** So how should you complete your homework for this class? - First thing to do is type all of your information about the problems you do in the text part of your R Notebook. - Second thing to do is type all of your R code into R chunks that can be run. - If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}. - Last thing is to spell check your R Notebook. Edit > Check Spelling... or hit the F7 key. Homework 8: Read: Chapter 12 Do 12.2.1 Exercises 1, 2 Do 12.3.3 Exercise 4 Do 12.4.3 Exercise 1 ```{r message=FALSE} library(tidyverse) ``` # 12.2.1 ## 1. Using prose, describe how the variables and observations are organised in each of the sample tables. **Answer:** In table1 each row is a (country, year) with variables cases and population. ```{r} table1 ``` In table2, each row is country, year , variable (“cases”, “population”) combination, and there is a count variable with the numeric value of the combination. ```{r} table2 ``` In table3, each row is a (country, year) combination with the column rate having the rate of cases to population as a character string in the format "cases/rate". ```{r} table3 ``` Table 4 is split into two tables, one table for each variable: table4a is the table for cases, while table4b is the table for population. Within each table, each row is a country, each column is a year, and the cells are the value of the variable for the table. ```{r} table4a table4b ``` ## 2. Compute the rate for table2, and table4a + table4b. You will need to perform four operations: Extract the number of TB cases per country per year. Extract the matching population per country per year. Divide cases by population, and multiply by 10000. Store back in the appropriate place. Which representation is easiest to work with? Which is hardest? Why? **Answer:** Using some code from Chapter 13. Relational data ```{r} table2 table2_cases <- table2 %>% filter(type == "cases") %>% rename(cases = count) %>% arrange(country, year) table2_cases table2_pop <- table2 %>% filter(type == "population") %>% rename(pop = count) %>% arrange(country, year) table2_pop table2_new <- table2_cases %>% inner_join(table2_pop, by = c("country","year")) table2_new table2_new %>% mutate(rate = (cases/pop)*10000) %>% select(country, year, rate) %>% arrange(year) %>% spread(year, rate) ``` Using table4a and table4b ```{r} table4a table4b table_new2 <- table4a %>% inner_join(table4b, by = c("country")) table_new2 table_new2a <- table_new2 %>% mutate( rate.1999 = (`1999.x`/`1999.y`)*10000, rate.2000 = (`2000.x`/`2000.y`)*10000 ) %>% select(country, rate.1999, rate.2000) table_new2a ``` # 12.3.3 ## 4 Tidy the simple tibble below. Do you need to spread or gather it? What are the variables? **Answer:** We need to gather the data into two new columns, sex and count. ```{r} preg <- tribble( ~pregnant, ~male, ~female, "yes", NA, 10, "no", 20, 12 ) preg ``` ```{r} preg %>% gather(male, female, key = "sex", value = "count") ``` # 12.4.3 # 1. What do the extra and fill arguments do in separate()? Experiment with the various options for the following two toy datasets. ```{r} tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three")) tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three")) ``` Examples: ```{r} tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three"), extra = "drop") ``` ```{r} tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three"), extra = "merge") ``` ```{r} tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three"), fill = "right") ``` ```{r} tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three"), fill = "left") ```