---
title: "Rules"
author: "Prof. Eric A. Suess"
date: "February 24, 2021"
output:
  beamer_presentation: default
  ioslides_presentation: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

## Introduction

Today we will discuss Classification algorithms using Rules.

We will learn about the following Rule bases algorithms.

- ZeroR 
- 1R
- RIPPER

Note that the C5.0 Algorithm can be used for Rule Learners with the option **rules = TRUE**.

## Data Mining Map

To see where we are in the class and to see what is to come,
take a look at this flow chart.

- [Data Mining Map](http://www.saedsayad.com/data_mining_map.htm)

**Question:** What is the difference between and flowchart and a
tree diagram?  

## Rules From Decision Trees

Earlier in chapter there was the example of classifying movies as 
Box Office Bust, Mainstream Hit, and Critical Success.  See page 123/131.

**Now Rules.**  

You can think of a rule as a path through a tree to a decision.  

See page 149/157 for rules that can be used to classify movies.  

There are other ways to come up with Rules.

## Decision Trees

Decision Trees are built using the approach known as 

**Divide and Conquer**.

Feature values are used to split the data into smaller and smaller subsets of similar cases.

## Rules

Classification Rules use the approach called 

**Separate and Conquer**.

According to the author...

The process involves identifying a rule that covers a subset of the examples in the training data, and then separates this partition from the remaining data.  As rules are added, additional subsets of data are separated until the entire dataset has been covered or no more examples remain.

## Rules

Rules based learners usually use 

**nominal features**

## Greedy learners

Both 

- **Divide-and-conquer** 
- **Separate-and-conquer** 

algorithms are known as **greedy learners** because data is used on a first-come, first serve basis.

- from Wikipedia [greedy algorithms](https://en.wikipedia.org/wiki/Greedy_algorithm)

## One Rule algorithm

**ZeroR** decide to pick the highest probability outcome.

**OneR** develop a rule with each feature, use the one rule that has the best performance.

## RIPPER algorithm

**RIPPER** 

Repeated Incremental Pruning to Produce Error Reduction

1. Grow
2. Prune
3. Optimize

The *information gain* criterion is used to identify the next splitting attribute.  When increasing rule's *specificity* no longer *reduces entropy*, then rule is immediately *pruned*.

## RIPPER algorithm

For further details about the RIPPER Algorithm, see Cohen's paper
and the following presentation.

- [Fast Effective Rule Induction](http://www.cs.utsa.edu/~bylander/cs6243/cohen95ripper.pdf)
- [A Ripper presentation](http://www.csee.usf.edu/~hall/dm/ripper.pdf)


## Today

Today we will try the identifying poisonous mushrooms example.

Check out some of my photos of mushrooms on flickr.

[mushroom](https://flic.kr/p/DWZmZU)

[mushroom](https://flic.kr/p/Db2sb2)

[mushroom](https://flic.kr/p/DFiwP1)

[mushroom](https://flic.kr/p/DyVPCa)