Lecture 7

Сентябрь 12, 2022

Содержание

2. Recap Decision Trees (in class) for classification Using categorical predictors Using classification error as our metric
3. Impurity Measures: Covered in Lab last Week Node impurity measures for two-class classification, as a function
4. Practice Yourself For each criteria, solve to figure out which split will it favor.
5. Today’s Objectives Overfitting in Decision Trees (Tree Pruning) Ensemble Learning ( combine the power of multiple
6. Overfitting in Decision Trees
7. Decision Boundaries at Different Depths
8. Generally Speaking
9. Decision Tree Over fitting on Real Data
10. Simple is Better When two trees have the same classification error on validation set, choose the
11. Modified Tree Learning Problem
12. Finding Simple Trees Early Stopping: Stop learning before the tree becomes too complex Pruning: Simplify tree
13. Criteria 1 for Early Stopping Limit the depth: stop splitting after max_depth is reached
14. Criteria 2 for Early Stopping
15. Criteria 3 for Early Stopping
16. Early Stopping: Summary
17. Pruning To simplify a tree, we need to define what do we mean by simplicity of
18. Which Tree is Simpler?
19. Which Tree is Simpler
20. Thus, Our Measure of Complexity
21. New Optimization Goal Total Cost = Measure of Fit + Measure of Complexity Measure of Fit
22. Tree Pruning Algorithm Let T be the final tree Start at the bottom of T and
23. prune_split
24. Ensemble Learning
25. Bias and Variance A complex model could exhibit high variance A simple model could exhibit high
26. Ensemble Classifier in General
27. Ensemble Classifier in General
28. Ensemble Classifier in General
29. Important A necessary and sufficient condition for an ensemble of classifiers to be more accurate than
30. Bagging: Reducing Variance using An Ensemble of Classifiers from Bootstrap Samples
31. Aside: Bootstrapping Creating new datasets from the training data with replacement
32. Training Set Voting Bootstrap Samples Classifiers Predictions Final Prediction New Data Bagging
33. Why Bagging Works?
34. Bagging Summary Bagging was first proposed by Leo Breiman in a technical report in 1994 He
35. Random Forests – Example of Bagging
36. Making a Prediction
37. Boosting: Converting Weak Learners to Strong Learners through Ensemble Learning
38. Boosting and Bagging Works in a similar way as bagging. Except: Models are built sequentially: each
39. Boosting: (1) Train A Classifier
40. Boosting: (2) Train Next Classifier by Focusing More on the Hard Points
41. What does it mean to focus more?
42. Example (Unweighted): Learning a Simple Decision Stump
43. Example (Weighted): Learning a Decision Stump on Weighted Data
44. Boosting
45. AdaBoost (Example of Boosting) Weight of the model New weights of the data points
47. Weighted Classification Error
48. AdaBoost: Computing Classifier’s Weights
49. AdaBoost
51. AdaBoost: Recomputing A Sample’s Weight Increase, Decrease, or Keep the Same
52. AdaBoost: Recomputing A Sample’s Weight
53. AdaBoost
54. AdaBoost: Normalizing Sample Weights
55. AdaBoost
56. Self Study What is the effect of of: Increasing the number of classifiers in bagging vs.
57. Boosting Summary
59. Скачать презентацию

Слайд 2

Recap
Decision Trees (in class)
for classification
Using categorical predictors
Using classification error as our

metric
Decision Trees (in lab)
For regression
Using continuous predictors
Using entropy, gini, and information gain

Слайд 3

Impurity Measures: Covered in Lab last Week
Node impurity measures for two-class

classification, as a function
of the proportion p in class 2. Cross-entropy has been scaled to pass through (0.5, 0.5).

Слайд 4

Practice Yourself
For each criteria, solve to figure out which split will

it favor.

Слайд 5

Today’s Objectives
Overfitting in Decision Trees (Tree Pruning)
Ensemble Learning ( combine the

power of multiple models in a single model while overcoming their weaknesses)
Bagging (overcoming variance)
Boosting (overcoming bias)

Слайд 6

Overfitting in Decision Trees

Слайд 7

Decision Boundaries at Different Depths

Слайд 8

Generally Speaking

Слайд 9

Decision Tree Over fitting on Real Data

Слайд 10

Simple is Better
When two trees have the same classification error on

validation set, choose the one that is simpler

Слайд 11

Modified Tree Learning Problem

Слайд 12

Finding Simple Trees
Early Stopping: Stop learning before the tree becomes too

complex
Pruning: Simplify tree after learning algorithm terminates

Слайд 13

Criteria 1 for Early Stopping
Limit the depth: stop splitting after

max_depth is reached

Слайд 14

Criteria 2 for Early Stopping

Слайд 15

Criteria 3 for Early Stopping

Слайд 16

Early Stopping: Summary

Слайд 17

Pruning
To simplify a tree, we need to define what do we

mean by simplicity of the tree

Слайд 18

Which Tree is Simpler?

Слайд 19

Which Tree is Simpler

Слайд 20

Thus, Our Measure of Complexity

Слайд 21

New Optimization Goal
Total Cost = Measure of Fit + Measure of

Complexity
Measure of Fit = Classification Error (large means bad fit to the data)
Measure of complexity = Number of Leaves (large means likely to overfit)

Слайд 22

Tree Pruning Algorithm
Let T be the final tree
Start at the bottom

of T and traverse up, apply prune_split at each decision node M

Слайд 23

prune_split

Слайд 24

Ensemble Learning

Слайд 25

Bias and Variance
A complex model could exhibit high variance
A simple model

could exhibit high bias

We can solve each case with ensemble learning.
Let’s first see what is ensemble learning.

Слайд 26

Ensemble Classifier in General

Слайд 27

Ensemble Classifier in General

Слайд 28

Ensemble Classifier in General

Слайд 29

Important
A necessary and sufficient condition for an ensemble of classifiers to

be more accurate than any of its individual members is if the members are accurate and diverse (Hansen & Salamon, 1990)

Слайд 30

Bagging: Reducing Variance using An Ensemble of Classifiers from Bootstrap Samples

Слайд 31

Aside: Bootstrapping
Creating new datasets from the training data with replacement

Слайд 32

Training Set

Voting

Bootstrap Samples
Classifiers
Predictions
Final Prediction
New Data
Bagging

Слайд 33

Why Bagging Works?

Слайд 34

Bagging Summary
Bagging was first proposed by Leo Breiman in a technical

report in 1994
He also showed that bagging can improve the accuracy of unstable models and decrease the degree of overfitting.
I highly recommend you read about his research in L. Breiman. Bagging Predictors. Machine Learning, 24(2):123–140, 1996,

Слайд 35

Random Forests – Example of Bagging

Слайд 36

Making a Prediction

Слайд 37

Boosting: Converting Weak Learners to Strong Learners through Ensemble Learning

Слайд 38

Boosting and Bagging
Works in a similar way as bagging.
Except:
Models are built

sequentially: each model is built using information from previously built models.
Boosting does not involve bootstrap sampling; instead each tree is fit on a modified version of the original data set

Слайд 39