--- title: "Logistic Regression" author: "Prof. Eric A. Suess" date: "Feburary 15, 2021" output: beamer_presentation: default ioslides_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Introduction Lantz Chapter 6 is about Regression Methods for Predicting Numeric Data. The author presents the basics of Simple Linear Regression and Multiple Linear Regression. Regression Methods can also be used for Classification. - Logistic Regression - CART ## Why not linear regression? The author applies linear regression to the launch data. In this data set the dependent variable is **distress_ct**. This variable has only 3 categories. And making predictions less than 0 or greater than 3 does not make much sense. **Logistics Regression** or **Multinomial Regression** would make more sense. These are **Generalized Linear Models** (GLMs). ## An excellent introduction to Logistic Regression Here is a link to an R-bloggers post [How to perform a Logistic Regression in R](http://www.r-bloggers.com/how-to-perform-a-logistic-regression-in-r/). The author of the post creates training and test data sets. And introduces the use of the **ROC** to evaluate the fitted model. ## Logistic Regression A logistic regression model, models a **binary dependent variable** $Y = 1$ or Yes or $Y = 0$ or No where $P(Y = 1 | X)$ is modeled in terms of the predictors $X$. ## Logistic Regression Try $P(Y = 1 | X) = \beta_0 + \beta_1 X$ but all probabilities need to be between 0 and 1. What is used is the **logit** function, to keep the values of the probabilities between 0 and 1. $P(Y = 1 | X) = \frac{e^{\beta_0 + \beta_1 X}}{1+e^{\beta_0 + \beta_1 X}}$ ## Logistic Regression So it turns out that the **log odds** are linear $log\left(\frac{P(Y=1 |X)}{1-P(Y=1|X)}\right) = \beta_0 + \beta_1 X$ This gives a nonlinear model that is estimated using MLEs by numerical methods. ## Multiple Logistic Regression **Multiple Logistic Regression** can be used when there is more than one predictor variable. Categorical or Numeric variables can be used as predictors. ## Evaluations The **AIC** is used to compare models. The **ROC** curve is used to compare models. The Area under the ROC is commonly used to evaluate and compare models. ## Logistic Regression Try Logistic Regression with the **launch** data and the **credit** data. ## CART Try CART with the **credit** data. This will be using the "C" in CART. ## An excellent introduction to Generalized Linear Models (GMLs) Here is a link to a Quick-R post [Generalized Linear Models](http://www.statmethods.net/advstats/glm.html).