Содержание

Слайд 2

Recap Decision Trees (in class) for classification Using categorical predictors Using

Recap

Decision Trees (in class)
for classification
Using categorical predictors
Using classification error as our

metric
Decision Trees (in lab)
For regression
Using continuous predictors
Using entropy, gini, and information gain
Слайд 3

Impurity Measures: Covered in Lab last Week Node impurity measures for

Impurity Measures: Covered in Lab last Week

Node impurity measures for two-class

classification, as a function
of the proportion p in class 2. Cross-entropy has been scaled to pass through (0.5, 0.5).
Слайд 4

Practice Yourself For each criteria, solve to figure out which split will it favor.

Practice Yourself

For each criteria, solve to figure out which split will

it favor.
Слайд 5

Today’s Objectives Overfitting in Decision Trees (Tree Pruning) Ensemble Learning (

Today’s Objectives

Overfitting in Decision Trees (Tree Pruning)
Ensemble Learning ( combine the

power of multiple models in a single model while overcoming their weaknesses)
Bagging (overcoming variance)
Boosting (overcoming bias)
Слайд 6

Overfitting in Decision Trees

Overfitting in Decision Trees

Слайд 7

Decision Boundaries at Different Depths

Decision Boundaries at Different Depths

Слайд 8

Generally Speaking

Generally Speaking

Слайд 9

Decision Tree Over fitting on Real Data

Decision Tree Over fitting on Real Data

Слайд 10

Simple is Better When two trees have the same classification error

Simple is Better

When two trees have the same classification error on

validation set, choose the one that is simpler
Слайд 11

Modified Tree Learning Problem

Modified Tree Learning Problem

Слайд 12

Finding Simple Trees Early Stopping: Stop learning before the tree becomes

Finding Simple Trees

Early Stopping: Stop learning before the tree becomes too

complex
Pruning: Simplify tree after learning algorithm terminates
Слайд 13

Criteria 1 for Early Stopping Limit the depth: stop splitting after max_depth is reached

Criteria 1 for Early Stopping

Limit the depth: stop splitting after

max_depth is reached
Слайд 14

Criteria 2 for Early Stopping

Criteria 2 for Early Stopping

 

Слайд 15

Criteria 3 for Early Stopping

Criteria 3 for Early Stopping

Слайд 16

Early Stopping: Summary

Early Stopping: Summary

Слайд 17

Pruning To simplify a tree, we need to define what do

Pruning

To simplify a tree, we need to define what do we

mean by simplicity of the tree
Слайд 18

Which Tree is Simpler?

Which Tree is Simpler?

Слайд 19

Which Tree is Simpler

Which Tree is Simpler

Слайд 20

Thus, Our Measure of Complexity

Thus, Our Measure of Complexity

Слайд 21

New Optimization Goal Total Cost = Measure of Fit + Measure

New Optimization Goal

Total Cost = Measure of Fit + Measure of

Complexity
Measure of Fit = Classification Error (large means bad fit to the data)
Measure of complexity = Number of Leaves (large means likely to overfit)
Слайд 22

Tree Pruning Algorithm Let T be the final tree Start at

Tree Pruning Algorithm

Let T be the final tree
Start at the bottom

of T and traverse up, apply prune_split at each decision node M
Слайд 23

prune_split

prune_split

 

Слайд 24

Ensemble Learning

Ensemble Learning

Слайд 25

Bias and Variance A complex model could exhibit high variance A

Bias and Variance

A complex model could exhibit high variance
A simple model

could exhibit high bias

We can solve each case with ensemble learning.
Let’s first see what is ensemble learning.

Слайд 26

Ensemble Classifier in General

Ensemble Classifier in General

Слайд 27

Ensemble Classifier in General

Ensemble Classifier in General

Слайд 28

Ensemble Classifier in General

Ensemble Classifier in General

Слайд 29

Important A necessary and sufficient condition for an ensemble of classifiers

Important

A necessary and sufficient condition for an ensemble of classifiers to

be more accurate than any of its individual members is if the members are accurate and diverse (Hansen & Salamon, 1990)
Слайд 30

Bagging: Reducing Variance using An Ensemble of Classifiers from Bootstrap Samples

Bagging: Reducing Variance using An Ensemble of Classifiers from Bootstrap Samples

Слайд 31

Aside: Bootstrapping Creating new datasets from the training data with replacement

Aside: Bootstrapping

Creating new datasets from the training data with replacement

Слайд 32

Training Set Voting Bootstrap Samples Classifiers Predictions Final Prediction New Data Bagging

Training Set

 

 

 

 

 

 

 

 

 

 

 

 

Voting

 

Bootstrap Samples

Classifiers

Predictions

Final Prediction

New Data

Bagging

Слайд 33

Why Bagging Works?

Why Bagging Works?

 

Слайд 34

Bagging Summary Bagging was first proposed by Leo Breiman in a

Bagging Summary

Bagging was first proposed by Leo Breiman in a technical

report in 1994
He also showed that bagging can improve the accuracy of unstable models and decrease the degree of overfitting.
I highly recommend you read about his research in L. Breiman. Bagging Predictors. Machine Learning, 24(2):123–140, 1996,
Слайд 35

Random Forests – Example of Bagging

Random Forests – Example of Bagging

 

Слайд 36

Making a Prediction

Making a Prediction

Слайд 37

Boosting: Converting Weak Learners to Strong Learners through Ensemble Learning

Boosting: Converting Weak Learners to Strong Learners through Ensemble Learning

Слайд 38

Boosting and Bagging Works in a similar way as bagging. Except:

Boosting and Bagging

Works in a similar way as bagging.
Except:
Models are built

sequentially: each model is built using information from previously built models.
Boosting does not involve bootstrap sampling; instead each tree is fit on a modified version of the original data set
Слайд 39

Boosting: (1) Train A Classifier

Boosting: (1) Train A Classifier

Слайд 40

Boosting: (2) Train Next Classifier by Focusing More on the Hard Points

Boosting: (2) Train Next Classifier by Focusing More on the Hard

Points
Слайд 41

What does it mean to focus more?

What does it mean to focus more?

Слайд 42

Example (Unweighted): Learning a Simple Decision Stump

Example (Unweighted): Learning a Simple Decision Stump

Слайд 43

Example (Weighted): Learning a Decision Stump on Weighted Data

Example (Weighted): Learning a Decision Stump on Weighted Data

Слайд 44

Boosting

Boosting

Слайд 45

AdaBoost (Example of Boosting) Weight of the model New weights of the data points

AdaBoost (Example of Boosting)

 

Weight of the model

New weights of the data

points
Слайд 46

Слайд 47

Weighted Classification Error

Weighted Classification Error

 

Слайд 48

AdaBoost: Computing Classifier’s Weights

AdaBoost: Computing Classifier’s Weights

 

Слайд 49

AdaBoost

AdaBoost

 

 

Слайд 50

Слайд 51

AdaBoost: Recomputing A Sample’s Weight Increase, Decrease, or Keep the Same

AdaBoost: Recomputing A Sample’s Weight

Increase, Decrease, or Keep the Same

Слайд 52

AdaBoost: Recomputing A Sample’s Weight

AdaBoost: Recomputing A Sample’s Weight

Слайд 53

AdaBoost

AdaBoost

Слайд 54

AdaBoost: Normalizing Sample Weights

AdaBoost: Normalizing Sample Weights

Слайд 55

AdaBoost

AdaBoost

Слайд 56

Self Study What is the effect of of: Increasing the number

Self Study

What is the effect of of:
Increasing the number of classifiers

in bagging
vs.
Increasing the number of classifiers in boosting
Слайд 57

Boosting Summary

Boosting Summary