Evaluating Machine Learning Models

Prof. Eric A. Suess

Evaluating Machine Learning Models

Chapter 5 of our books discusses how to evaluate Machine Learning Models in general.

Beyond Classification and Prediction/Regression

Types of Machine Learning Tasks

  • Supervised Learning
  • Unsupervised Learning
  • Self Supervised Learning
  • Reinforcement Learning

Classification and regression glossary

Read the glossary and note any differences with the terms you know.

Evaluation, Holdout Method

  • Training
  • Validation, used to tune your model, deal with over-fitting
  • Test

Simple Holdout Methods

  1. Shuffling the data is usually appropriate.
  2. Defines the validation set
  3. Defines the training set
  4. Train a model on the training data, and evaluate it on the validation data
  5. Once you’ve tuned your hyperparameters, it’s common to train your final model from scratch on all non-test data available.

k-fold cross-validation

  1. Selects the validation-data partition
  2. Uses the remainder of the data as training data
  3. Creates a brand-new instance of the model (untrained)
  4. Validation score: average of the validation scores of the K-folds
  5. Trains the final model on all non-test data available

Data Preprosessing

  • Normalization
  • Missing Data, ok to use 0! This is new.

Feature Engineering

  • “Modern deep learning removes the need for most feature engineering, because neural networks are capable of automatically extracting useful features from raw data. Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks? No, for two reasons: …”
  • Go read the reasons. What are the reasons?

Overfitting and Underfitting

  • Overfitting is when your neural network is too big.

  • Underfitting is when your neural network is too small.

  • The goal is to find the right sized neural network.

  • Regularization

  • “You may be familiar with the principle of Occam’s razor: given two explanations for something, the explanation most likely to be correct is the simplest one — the one that makes fewer assumptions.”

  • Dropout

Universal Workflow for Machine Learning

Chapter 6 of our books discusses the universal workflow of machine learning.

Universal Workflow for Machine Learning

  1. Defining the problem and assembling a dataset
  2. Choosing a measure of success
  3. Deciding on an evaluation protocol
  4. Prepare your data
  5. Developing a model that does better than a baseline, Null Model
  6. Scaling up: developing a model that overfits
  7. Regularizing your model and tuning your hyperparameters

Summary I

  • Define the problem at hand and the data on which you’ll train. Collect this data, or annotate it with labels if need be.
  • Choose how you’ll measure success on your problem. Which metrics will you monitor on your validation data?
  • Determine your evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should you use for validation?

Summary II

  • Develop a first model that does better than a basic baseline: a model with statistical power.
  • Develop a model that overfits.
  • Regularize your model and tune its hyperparameters, based on performance on the validation data. A lot of machine-learning research tends to focus only on this step—but keep the big picture in mind.