---
title: "Evaluating Machine Learning Models"
author: "Prof. Eric A. Suess"
format: 
  revealjs:
    embed-resources: true
---

## Evaluating Machine Learning Models

Chapter 5 of our books discusses how to evaluate Machine Learning Models in general.

## Beyond Classification and Prediction/Regression

Types of Machine Learning Tasks

- Supervised Learning 
- Unsupervised Learning
- Self Supervised Learning
- Reinforcement Learning

## Classification and regression glossary

Read the [glossary](https://ml-cheatsheet.readthedocs.io/en/latest/glossary.html) and note any differences with the terms you know.

## Evaluation, Holdout Method

- Training 
- Validation, used to tune your model, deal with over-fitting
- Test

## Simple Holdout Methods

1. Shuffling the data is usually appropriate.
2. Defines the validation set
3. Defines the training set
4. Train a model on the training data, and evaluate it on the validation data
5. Once you’ve tuned your hyperparameters, it’s common to train your final model from scratch on all non-test data available.

## k-fold cross-validation

1. Selects the validation-data partition
2. Uses the remainder of the data as training data
3. Creates a brand-new instance of the model (untrained)
4. Validation score: average of the validation scores of the K-folds
5. Trains the final model on all non-test data available

## Data Preprosessing

- Normalization
- Missing Data, ok to use 0!  This is new.
- If the dataset does not have any missing data an you expect to have missing data in the future, you could modify some of the data to have missing values so the neural network knows how to handle them.

## Feature Engineering

- "Modern deep learning *removes the need for most feature engineering*, because neural networks are *capable of automatically extracting useful features* from raw data. Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks? No, for two reasons: ... "
- Go read the reasons.  What are the reasons?  Commit the reasons to memory so you can answer this question in an interview.

## Overfitting and Underfitting

- Overfitting is when your neural network is too big.
- Underfitting is when your neural network is too small.
- The goal is to find the right sized neural network.

- **Regularization**

- "You may be familiar with the principle of Occam’s razor: given two explanations for something, the explanation most likely to be correct is the **simplest one** — the one that makes fewer assumptions."

- **Dropout**

## Universal Workflow for Machine Learning

Chapter 6 of our books discusses the universal workflow of machine learning.

## Universal Workflow for Machine Learning

1. Defining the problem and assembling a dataset
2. Choosing a measure of success
3. Deciding on an evaluation protocol
4. Prepare your data
5. Developing a model that does better than a baseline, Null Model
6. Scaling up: developing a model that overfits
7. Regularizing your model and tuning your hyperparameters

## Summary I

- Define the problem at hand and the data on which you’ll train. Collect this data, or annotate it with labels if need be.
- Choose how you’ll measure success on your problem. Which metrics will you monitor on your validation data?
- Determine your evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should you use for validation?

## Summary II

- Develop a first model that does better than a basic baseline: a model with statistical power.
- Develop a model that overfits.
- Regularize your model and tune its hyperparameters, based on performance on the validation data. A lot of machine-learning research tends to focus only on this step—but keep the big picture in mind.