2024-02-21
Today we will discuss Classification algorithms using Rules.
We will learn about the following Rule bases algorithms.
Note that the C5.0 Algorithm can be used for Rule Learners with the option rules = TRUE.
To see where we are in the class and to see what is to come, take a look at this flow chart.
Question: What is the difference between and flowchart and a tree diagram?
Earlier in chapter there was the example of classifying movies as Box Office Bust, Mainstream Hit, and Critical Success. See page 123/131.
Now Rules.
You can think of a rule as a path through a tree to a decision.
See page 149/157 for rules that can be used to classify movies.
There are other ways to come up with Rules.
Decision Trees are built using the approach known as
Divide and Conquer.
Feature values are used to split the data into smaller and smaller subsets of similar cases.
Classification Rules use the approach called
Separate and Conquer.
According to the author…
The process involves identifying a rule that covers a subset of the examples in the training data, and then separates this partition from the remaining data. As rules are added, additional subsets of data are separated until the entire dataset has been covered or no more examples remain.
Rules based learners usually use
nominal features
Both
algorithms are known as greedy learners because data is used on a first-come, first serve basis.
ZeroR decide to pick the highest probability outcome.
OneR develop a rule with each feature, use the one rule that has the best performance.
RIPPER
Repeated Incremental Pruning to Produce Error Reduction
The information gain criterion is used to identify the next splitting attribute. When increasing rule’s specificity no longer reduces entropy, then rule is immediately pruned.
Today we will try the identifying poisonous mushrooms example.
Check out some of my photos of mushrooms on flickr.