March 18, 2020

Introduction

Today we will discuss Classification algorithms using Rules.

We will learn about the following Rule bases algorithms.

  • ZeroR
  • 1R
  • RIPPER

Note that the C5.0 Algorithm can be used for Rule Learners with the option rules = TRUE.

Data Mining Map

To see where we are in the class and to see what is to come, take a look at this flow chart.

Question: What is the difference between and flowchart and a tree diagram?

Rules From Decision Trees

Earlier in chapter there was the example of classifying movies as Box Office Bust, Mainstream Hit, and Critical Success. See page 123/131.

Now Rules.

You can think of a rule as a path through a tree to a decision.

See page 149/157 for rules that can be used to classify movies.

There are other ways to come up with Rules.

Decision Trees

Decision Trees are built using the approach known as

Divide and Conquer.

Feature values are used to split the data into smaller and smaller subsets of similar cases.

Rules

Classification Rules use the approach called

Separate and Conquer.

According to the author…

The process involves identifying a rule that covers a subset of the examples in the training data, and then separates this partition from the remaining data. As rules are added, additional subsets of data are separated until the entire dataset has been covered or no more examples remain.

Rules

Rules based learners usually use

nominal features

Greedy learners

Both

  • Divide-and-conquer
  • Separate-and-conquer

algorithms are known as greedy learners because data is used on a first-come, first serve basis.

One Rule algorithm

ZeroR decide to pick the highest probability outcome.

OneR develop a rule with each feature, use the one rule that has the best performance.

RIPPER algorithm

RIPPER

Repeated Incremental Pruning to Produce Error Reduction

  1. Grow
  2. Prune
  3. Optimize

The information gain criterion is used to identify the next splitting attribute. When increasing rule's specificity no longer reduces entropy, then rule is immediately pruned.

RIPPER algorithm

Today