2024-02-19
Today we will begin to discuss Classification algorithms using Decision Trees and Rules.
We will learn about the C5.0, 1R, and RIPPER algorithms.
Decision Tree learners build a model in the form of a tree structure. Similar to a flowchart. Decision nodes indicate a decision to be made on an attribute/feature/variable. These split the tree into branches. Ending in the leaf nodes.
The tree begins with the data in the the root node.
The path that each example/observation/record takes through the tree funnels it to a leaf node which assigns it to a predicted class.
Decision trees are very transparent.
Decision trees can be used with almost any kind of data.
However, if there are a large number of categorical features with a large number of categories or if there are a large number of numerical features, then decision trees may lead to very complex trees, which may not be so useful.
Decision trees are build using recursive partitioning, also known as divide and conquer. Splits the data in to smaller and smaller subsets of similar classes.
The first step is to find the most predictive feature of the target class. The next steps proceed with the next most predictive feature. In the end a stopping criterion is used.
Movie scripts.
Predict if
Use two features of past data
The splits of the data
See page 123/131.
J. Ross Quinlan C5.0 Rulequest Research
A single threaded version of C5.0 has been release as open source software, it is included in R and other software, such as weka.
If a segment of data is a single class, it is considered pure.
C5.0 uses entropy for measuring purity.
The entropy of a sample of data indicates how mixed the class values are.
The definition of entropy is:
\(Entropy(S) = \sum_{i=1}^c -p_i log_2(p_i)\)
where
To choose the feature to split on, information gain is used.
\(InfoGain(F) = Entropy(S_1) - Entropy(S_2)\)
See the paper An Empirical Comaprison of Selection Measures for Decision-Tree Induction for a discussion of other criterion that can be used.
We do not want overly complicated trees. Smaller trees may be better for understanding and generalizing.
Next time we will work with the bank loans data example to develop a decision tree.