Today we will be discussing:
- meta-learning
- ensembles
- bagging - bootstrap aggregating
- boosting - boosts the performance of weak learners
- random forests - decision tree forests
May 6, 2020
Today we will be discussing:
The caret package in R gives a unified interface to most of the packages in R that are used for data mining, machine learning and statistical learning.
It also provides a very nice function for splitting data.
It provides functions for evaluating the results.
Helps with working with ensembles.
And and it gives parallel processing when appropriate.
On a totally unrelated package, there is the zelig package for basic statistics.
This is another package that tries to produce a unified interface, but for statistics in general.
(I have enjoyed working with this package in the past. I also like the naming reference.)
The caret package is very useful for tuning models.
When tuning models, many different similar models need to be fit. This is a perfect situation for parallel processing.
The caret package can parallel process with the parallel package in R.
The idea is to combine several models to form a powerful team.
Build a strong team of weak learners.
All ensemble methods are based on the idea that by combining multiple weaker learners, a stronger learner is created.
bootstrap aggreating
Here sampling without replacement is used to produce the traning data, the other examples are said to be out-of-bag, they are used to validation/testing.
Ensembles of models are trained on randomly resampled data and voting (weighted) is used to determine the final prediction used.
We implemented boosting eariler with C5.0
Ensembles of decision trees are produces and voting is used again.
Here not only is the data resampled, but now the features are also sampled.
Decision tree forests.
The author uses random forests to look at the credit data again.