---
title: 'Stat. 450 Section 1 or 2: Homework 8'
output:
  word_document: default
  pdf_document: default
  html_notebook: default
  html_document:
    df_print: paged
---

**Prof. Eric A. Suess**

So how should you complete your homework for this class?

- First thing to do is type all of your information about the problems you do in the text part of your R Notebook.
- Second thing to do is type all of your R code into R chunks that can be run.
- If you load the tidyverse in an R Notebook chunk, be sure to include the "message = FALSE" in the {r}, so {r message = FALSE}.
- Last thing is to spell check your R Notebook.  Edit > Check Spelling... or hit the F7 key.

Homework 8:

     Read: Chapter 12

     Do 12.2.1 Exercises 1, 2
     Do 12.3.3 Exercise 4
     Do 12.4.3 Exercise 1


```{r message=FALSE}
library(tidyverse)
```

# 12.2.1

## 1.

Using prose, describe how the variables and observations are organised in each of the sample tables.

**Answer:**

In table1 each row is a (country, year) with variables cases and population.

```{r}
table1
```

In table2, each row is country, year , variable (“cases”, “population”) combination, and there is a count variable with the numeric value of the combination.


```{r}
table2
```

In table3, each row is a (country, year) combination with the column rate having the rate of cases to population as a character string in the format "cases/rate".

```{r}
table3
```

Table 4 is split into two tables, one table for each variable: table4a is the table for cases, while table4b is the table for population. Within each table, each row is a country, each column is a year, and the cells are the value of the variable for the table.

```{r}
table4a
table4b
```


## 2.

Compute the rate for table2, and table4a + table4b. You will need to perform four operations:

Extract the number of TB cases per country per year.
Extract the matching population per country per year.
Divide cases by population, and multiply by 10000.
Store back in the appropriate place.
Which representation is easiest to work with? Which is hardest? Why?


**Answer:**

Using some code from Chapter 13. Relational data

```{r}
table2

table2_cases <- table2 %>% filter(type == "cases") %>% rename(cases = count) %>% arrange(country, year)
table2_cases

table2_pop <- table2 %>% filter(type == "population") %>% rename(pop = count) %>% arrange(country, year)
table2_pop

table2_new <- table2_cases %>% inner_join(table2_pop, by = c("country","year"))
table2_new

table2_new %>% mutate(rate = (cases/pop)*10000) %>%
  select(country, year, rate) %>%
  arrange(year) %>%
  spread(year, rate)
```

Using table4a and table4b

```{r}
table4a
table4b

table_new2 <- table4a %>% inner_join(table4b, by = c("country"))
table_new2

table_new2a <- table_new2 %>% mutate(
  rate.1999 = (`1999.x`/`1999.y`)*10000, 
  rate.2000 = (`2000.x`/`2000.y`)*10000
  ) %>%
  select(country, rate.1999, rate.2000)
table_new2a
```

# 12.3.3

## 4

Tidy the simple tibble below. Do you need to spread or gather it? What are the variables?

**Answer:**

We need to gather the data into two new columns, sex and count.

```{r}
preg <- tribble(
  ~pregnant, ~male, ~female,
  "yes",     NA,    10,
  "no",      20,    12
)

preg
```


```{r}
preg %>% gather(male, female, key = "sex", value = "count")
```


# 12.4.3

# 1.

What do the extra and fill arguments do in separate()? Experiment with the various options for the following two toy datasets.


```{r}
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% 
  separate(x, c("one", "two", "three"))

tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% 
  separate(x, c("one", "two", "three"))
```

Examples:

```{r}
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"), extra = "drop")
```


```{r}
tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
  separate(x, c("one", "two", "three"), extra = "merge")
```


```{r}
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"), fill = "right")
```


```{r}
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
  separate(x, c("one", "two", "three"), fill = "left")
```