--- title: "Factors" output: word_document: default html_notebook: default pdf_document: default --- Here are some examples from Chapter 15. The examples are related to the [General Social Survey](http://gss.norc.org/) from NORC at the Unversity of Chicago. ```{r message = FALSE} library(tidyverse) library(forcats) gss_cat gss_cat %>% count(race) ``` Factor variables are used to make bar charts. The *geom_bar()* counts the observations in each level of the factor. ```{r} ggplot(gss_cat, aes(race)) + geom_bar() ``` Forcing NAs. ```{r} ggplot(gss_cat, aes(race)) + geom_bar() + scale_x_discrete(drop = FALSE) ``` Modifying the order of a factor. Examine tv watch time by religion. ```{r} relig_summary <- gss_cat %>% group_by(relig) %>% summarise( age = mean(age, na.rm = TRUE), tvhours = mean(tvhours, na.rm = TRUE), n = n() ) relig_summary %>% ggplot(aes(tvhours, relig)) + geom_point() ``` ```{r} relig_summary %>% ggplot(aes(tvhours, fct_reorder(relig, tvhours))) + geom_point() ``` The *fct_reorder()* functon should be used in a mutate statement. Same as the last code. ```{r} relig_summary %>% mutate(relig = fct_reorder(relig, tvhours)) %>% ggplot(aes(tvhours, relig)) + geom_point() ``` Now tv watch time by average age. ```{r} rincome_summary <- gss_cat %>% group_by(rincome) %>% summarise( age = mean(age, na.rm = TRUE), tvhours = mean(tvhours, na.rm = TRUE), n = n() ) rincome_summary %>% ggplot(aes(age, fct_reorder(rincome, age))) + geom_point() ``` Does this make sense? What is wrong with this plot? ```{r} rincome_summary %>%ggplot(aes(age, fct_relevel(rincome, "Not applicable"))) + geom_point() ``` Using *mutate()* ```{r} gss_cat %>% ggplot(aes(marital)) + geom_bar() gss_cat %>% mutate(marital = marital) %>% ggplot(aes(marital)) + geom_bar() gss_cat %>% mutate(marital = marital %>% fct_infreq()) %>% ggplot(aes(marital)) + geom_bar() gss_cat %>% mutate(marital = marital %>% fct_infreq() %>% fct_rev()) %>% ggplot(aes(marital)) + geom_bar() ``` Modifying factor levels. ```{r} gss_cat %>% count(partyid) ``` Re-coding ```{r} gss_cat %>% mutate(partyid = fct_recode(partyid, "Republican, strong" = "Strong republican", "Republican, weak" = "Not str republican", "Independent, near rep" = "Ind,near rep", "Independent, near dem" = "Ind,near dem", "Democrat, weak" = "Not str democrat", "Democrat, strong" = "Strong democrat" )) %>% count(partyid) ``` Other category ```{r} gss_cat %>% mutate(partyid = fct_recode(partyid, "Republican, strong" = "Strong republican", "Republican, weak" = "Not str republican", "Independent, near rep" = "Ind,near rep", "Independent, near dem" = "Ind,near dem", "Democrat, weak" = "Not str democrat", "Democrat, strong" = "Strong democrat", "Other" = "No answer", "Other" = "Don't know", "Other" = "Other party" )) %>% count(partyid) ``` Collapse a factor ```{r} gss_cat %>% mutate(partyid = fct_collapse(partyid, other = c("No answer", "Don't know", "Other party"), rep = c("Strong republican", "Not str republican"), ind = c("Ind,near rep", "Independent", "Ind,near dem"), dem = c("Not str democrat", "Strong democrat") )) %>% count(partyid) ```