---
title: "Tuning"
author: "Prof. Eric A. Suess"
date: "February 22, 2021"
output:
  beamer_presentation: default
  ioslides_presentation: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

## Introduction

In Lantz Chapter 11 the author discusses improving model performance.

The idea of **model performance** is discusses in terms of

- Tuning parameters
- Ensembles

**Tuning parameters** can be used to improve the performance of a single model.

**Ensembles** can be used to build of team of learners that may have better performance than a single model.

## Tuning Parameters

We know a little about **tuning parameters** from the C5.0 algorithm where we introducted **Boosting**.

And from kNN where tried different values of k.

Suppose we set a range of values for a tuning parameter and then fit the model for each of the values of the tuning parameter, keeping a measure of performance.  Then we can pick the **best value** of the tuning parameter and the model produced.

## caret package

The **caret** package in R gives functions that make tuning a model easy.  

The functions train(), trainControl() and expand.grid()

The **kappa** value can be used to **optimize**.

## tidymodels packages

In the [tidymodels](https://www.tidymodels.org/) set of packages there is [tune](https://tune.tidymodels.org/) package that can be used to set up search grids and beyond.

## Running experiments

In Machine Learning tuning over a grid is called running an **experiment**.

## Ensembles

The author discusses **meta-learners**.

The technique of combinding and managing the predictions of multiple models falls within a wider set of **meta-learning** methods that broadly encompass any technique that involves learning how to learn.

These may include:

- gradually improved performance by automatically iterating over design decisions
- self-modifying and adapting to learning tests

## Ensembles

All **ensemble methods** are based on the idea that by combining multiple weaker learners, a stronger learner is created.

Use a **team of models** or a **committee of models**.

- Bagging
- Boosting
- Random Forests

## Bagging

**Boostrap aggregating** or Bagging

A number of training datasets are generated by boostrap sampling the original training data.  Boostrap sampling is sampling the same number of rows as there are in the training data, **with replacement**.

These datasets are used to generate a set of models using a single learning algorithm.

The models' predictions are combined using **voting** (for classification) or **averaging** (for prediction).

Bagging needs **unstable** learners.  So bagging is often used with decision trees.

## Boosting

Boosting uses ensembles of models trained on resampled data (**re-weighted datasets**) and a vote to determine the final classification or average for a prediction.  

The resampled datasets in boosting are constructed specifically to generate **complementary learners**, and the vote is weighted based on each model's performance rather than giving each an equal vote.

## AdaBoost

**AdaBoost** or adaptive boosting.

The algorithm is based on the idea that generating weak learners that iteratively learn a larger portion of the **difficult-to-classify** examples in the training data by paying more attention (that is giving more weight) to the often misclassified examples.

## Other Boosting Algorithms

- **Gradient Boosting Machines**
- **XGBoost**
- **LightGBM** from Microsoft
- **CatBoost** from Yandex

See [A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning](http://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/)

See 

1. [XGBoost](http://xgboost.readthedocs.io/en/latest/) website.
2. [Gradient Boosting, Decision Trees and XGBoost with CUDA](https://devblogs.nvidia.com/gradient-boosting-decision-trees-xgboost-cuda/)
3. [LightGBM](https://lightgbm.readthedocs.io/en/latest/)
4. [CatBoost](https://catboost.ai/)
5. To use with tidymodels see [treesnip](https://github.com/curso-r/treesnip).

## Random Forests

This methods combines the base principles of Bagging with **random feature selection** to add additional diversity to the decision tree models.

After the ensemble of trees is generated, the model uses a vote to combine the trees' predictions.

Because random forests use only a small, random portion of the full feature set, it can handle **extremely large datasets**.

## Machine Learning Competitions

The author gives an example of picking the most accurate model for submission to a **machine learning competition**. 

See the end of Lantz Chapter 11.