Evaluating Machine Learning Models

Prof. Eric A. Suess

Evaluating Machine Learning Models

Chapter 5 of our books discusses how to evaluate Machine Learning Models in general.

Beyond Classification and Prediction/Regression

Types of Machine Learning Tasks

Supervised Learning
Unsupervised Learning
Self Supervised Learning
Reinforcement Learning

Classification and regression glossary

Read the glossary and note any differences with the terms you know.

Evaluation, Holdout Method

Training
Validation, used to tune your model, deal with over-fitting
Test

Simple Holdout Methods

Shuffling the data is usually appropriate.
Defines the validation set
Defines the training set
Train a model on the training data, and evaluate it on the validation data
Once you’ve tuned your hyperparameters, it’s common to train your final model from scratch on all non-test data available.

k-fold cross-validation

Selects the validation-data partition
Uses the remainder of the data as training data
Creates a brand-new instance of the model (untrained)
Validation score: average of the validation scores of the K-folds
Trains the final model on all non-test data available

Data Preprosessing

Normalization
Missing Data, ok to use 0! This is new.
If the dataset does not have any missing data an you expect to have missing data in the future, you could modify some of the data to have missing values so the neural network knows how to handle them.

Feature Engineering

“Modern deep learning removes the need for most feature engineering, because neural networks are capable of automatically extracting useful features from raw data. Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks? No, for two reasons: …”
Go read the reasons. What are the reasons? Commit the reasons to memory so you can answer this question in an interview.

Overfitting and Underfitting

Overfitting is when your neural network is too big.
Underfitting is when your neural network is too small.
The goal is to find the right sized neural network.
Regularization
“You may be familiar with the principle of Occam’s razor: given two explanations for something, the explanation most likely to be correct is the simplest one — the one that makes fewer assumptions.”
Dropout

Universal Workflow for Machine Learning

Chapter 6 of our books discusses the universal workflow of machine learning.

Universal Workflow for Machine Learning

Defining the problem and assembling a dataset
Choosing a measure of success
Deciding on an evaluation protocol
Prepare your data
Developing a model that does better than a baseline, Null Model
Scaling up: developing a model that overfits
Regularizing your model and tuning your hyperparameters

Summary I

Define the problem at hand and the data on which you’ll train. Collect this data, or annotate it with labels if need be.
Choose how you’ll measure success on your problem. Which metrics will you monitor on your validation data?
Determine your evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should you use for validation?

Summary II

Develop a first model that does better than a basic baseline: a model with statistical power.
Develop a model that overfits.
Regularize your model and tune its hyperparameters, based on performance on the validation data. A lot of machine-learning research tends to focus only on this step—but keep the big picture in mind.