--- title: "Evaluation" author: "Prof. Eric A. Suess" date: "February 17, 2021" output: beamer_presentation: default ioslides_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Introduction We have primarily talked about **Classification methods**, such as, kNN, Naive Bayes, C5.0, RIPPER, CART, Logistic Regression, etc. In the Classification setting we have used **Accuracy/Success Rate** to Evaluate the "usefulness" of an algorithm. $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ So we have looked at the **Confussion Matrix**. > acc <- mean( pred == testy ) ## Introduction We have started to look at **Prediction methods**, such as, Linear Regression, Multiple Linear Regression, etc. So we looked at "Accuracy" as the **correlation** between the test values of the **response** and the **predicted/fitted** values from the model. When using Prediction methods a quantitative response is predicted. ## Question: But Logistic Regression is used for Classification, right? ## Answer: Yes, but it uses the predicted probabilties. In R we can classify using the **ifelse()** function to convert the probabilities into 0 and 1. > ifelse(prob < 0.5, 0, 1) ## Beyond Accuracy There are a number of values that can be calculated to evaluate accuracy using Classification algorithms. ## Beyond Accuracy - **Kappa** - adjusts accuracy by accounting for the possibility of a correct prediction by chance alone. So should be a bit smaller than what we have discussed as Accuracy. ## Beyond Accuracy - **Sensitivity** $Sensitivity = \frac{TP}{TP + FN} \approx P(+|D)$ - **Specificity** $Specificity = \frac{TN}{TN + FP} \approx P(-|D^c)$ ## Beyond Accuracy - **Precision** $Precision = \frac{TP}{TP + FP}$ - **Recall** $Recall = \frac{TP}{TP + FN}$ ## Beyond Accuracy - **F-measure** or F1 or F-score $F measure = \frac{2 \times Precision \times Recall}{Precision + Recall} =\frac{2 \times TP}{2 \times TP + FP + FN}$ The F-measure assumes equal weight for the Precision and Recall. This may not always be the case. ## Visualizing Performance Tradeoffs - ROC Visualizations can be very helpful for understanding how the performance of learning algorithms differ. Useful for comparing two or more learners side-by-side. The [Receiver Operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) (ROC) is commonly used. To use the ROC we need: 1. the class values/labels 2. the predicted probabilities of the **positive class** ## ROC - Sensitivity/Specificity plot See page 312/332 for an example. The ROC plots the **Sensitivity** versus **1 - Specificity**. For the MS Statistics students this is: - **True Positive Rate** versus **False Positive Rate** or - **Power** versus **$\alpha$**. ## ROC - Sensitivity/Specificity plot **No predictive value**, 45 degree line **Perfect predictive value**, up and across. 100% true positives with no errors. ## ROC - AUC The **Area Under the Curve** (AUC) is commonly used to compare Classifiers. ## Holdout Method - Training - Validation - Testing Repeated Holdout ## Cross-Validation k-fold cross validation 10-fold cross validation Train on 9 of the folds and test on the last. Average the accuracy measure. ## Bootstrap sampling Random sample with replacement. Train on the sample and test on the remaining examples. $$error = 0.632 \times error_{test} + 0.368 \times error_{train}$$