May 6, 2020

Introduction

Today we will be discussing:

  • meta-learning
  • ensembles
  • bagging - bootstrap aggregating
  • boosting - boosts the performance of weak learners
  • random forests - decision tree forests

caret

The caret package in R gives a unified interface to most of the packages in R that are used for data mining, machine learning and statistical learning.

It also provides a very nice function for splitting data.

It provides functions for evaluating the results.

Helps with working with ensembles.

And and it gives parallel processing when appropriate.

zelig

caret

The caret package is very useful for tuning models.

When tuning models, many different similar models need to be fit. This is a perfect situation for parallel processing.

The caret package can parallel process with the parallel package in R.

Meta-learning

The idea is to combine several models to form a powerful team.

Build a strong team of weak learners.

Ensembles

All ensemble methods are based on the idea that by combining multiple weaker learners, a stronger learner is created.

Bagging

bootstrap aggreating

Here sampling without replacement is used to produce the traning data, the other examples are said to be out-of-bag, they are used to validation/testing.

Boosting

Ensembles of models are trained on randomly resampled data and voting (weighted) is used to determine the final prediction used.

We implemented boosting eariler with C5.0

Random forests

Ensembles of decision trees are produces and voting is used again.

Here not only is the data resampled, but now the features are also sampled.

Decision tree forests.

Example

The author uses random forests to look at the credit data again.