--- title: "Evaluating Machine Learning Models" author: "Prof. Eric A. Suess" format: revealjs --- ## Evaluating Machine Learning Models Chapter 5 of our books discusses how to evaluate Machine Learning Models in general. ## Beyond Classification and Prediction/Regression Types of Machine Learning Tasks - Supervised Learning - Unsupervised Learning - Self Supervised Learning - Reinforcement Learning ## Classification and regression glossary Read the [glossary](https://ml-cheatsheet.readthedocs.io/en/latest/glossary.html) and note any differences with the terms you know. ## Evaluation, Holdout Method - Training - Validation, used to tune your model, deal with over-fitting - Test ## Simple Holdout Methods 1. Shuffling the data is usually appropriate. 2. Defines the validation set 3. Defines the training set 4. Train a model on the training data, and evaluate it on the validation data 5. Once you’ve tuned your hyperparameters, it’s common to train your final model from scratch on all non-test data available. ## k-fold cross-validation 1. Selects the validation-data partition 2. Uses the remainder of the data as training data 3. Creates a brand-new instance of the model (untrained) 4. Validation score: average of the validation scores of the K-folds 5. Trains the final model on all non-test data available ## Data Preprosessing - Normalization - Missing Data, ok to use 0! This is new. ## Feature Engineering - "Modern deep learning *removes the need for most feature engineering*, because neural networks are *capable of automatically extracting useful features* from raw data. Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks? No, for two reasons: ... " - Go read the reasons. What are the reasons? ## Overfitting and Underfitting - Overfitting is when your neural network is too big. - Underfitting is when your neural network is too small. - The goal is to find the right sized neural network. - **Regularization** - "You may be familiar with the principle of Occam’s razor: given two explanations for something, the explanation most likely to be correct is the **simplest one** — the one that makes fewer assumptions." - **Dropout** ## Universal Workflow for Machine Learning Chapter 6 of our books discusses the universal workflow of machine learning. ## Universal Workflow for Machine Learning 1. Defining the problem and assembling a dataset 2. Choosing a measure of success 3. Deciding on an evaluation protocol 4. Prepare your data 5. Developing a model that does better than a baseline, Null Model 6. Scaling up: developing a model that overfits 7. Regularizing your model and tuning your hyperparameters ## Summary I - Define the problem at hand and the data on which you’ll train. Collect this data, or annotate it with labels if need be. - Choose how you’ll measure success on your problem. Which metrics will you monitor on your validation data? - Determine your evaluation protocol: Hold-out validation? K-fold validation? Which portion of the data should you use for validation? ## Summary II - Develop a first model that does better than a basic baseline: a model with statistical power. - Develop a model that overfits. - Regularize your model and tune its hyperparameters, based on performance on the validation data. A lot of machine-learning research tends to focus only on this step—but keep the big picture in mind.