Introduction to Machine Learning. Week 12

Слайд 2

Lecture outline Machine Learning Definition Evaluation of the Logit Models: Train

Lecture outline

Machine Learning Definition
Evaluation of the Logit Models:
Train and Test Datasets
Confusion

Matrix, Accuracy
Sensitivity
Specificity
Precision
Слайд 3

Machine Learning Definitions Algorithm: A Machine Learning algorithm is a set

Machine Learning Definitions

Algorithm:
A Machine Learning algorithm is a set of

rules and statistical techniques used to learn patterns from data and draw significant information from it. It is the logic behind a Machine Learning model.
Ex: Linear Regression or Logistic Regression algorithm.
Model:
A model is the main component of Machine Learning.
A model is trained by using a Machine Learning Algorithm.
An algorithm maps all the decisions that a model is supposed to take based on the given input, in order to get the correct output.
Слайд 4

Data Partitioning: Training Data: The Machine Learning model is built using

Data Partitioning:

Training Data: 
The Machine Learning model is built using the training

data. The training data helps the model to identify key trends and patterns essential to predict the output.
Testing Data: 
After the model is trained, it must be tested to evaluate how accurately it can predict an outcome. This is done by the testing data set.

Training data:
To build the model

Test data:
To evaluate
the model

DATA

Слайд 5

Machine Learning Process

Machine Learning Process

Слайд 6

Model Evaluation: Titanic Dataset Fit a model to predict survival based

Model Evaluation: Titanic Dataset

Fit a model to predict survival based on

Pclass, Gender, Age and Fare. Use first 700 observations to fit your model (train dataset). Use the remaining observations (187 obs) as test dataset and evaluate your model using Confusion Matrix*.

*Confusion Matrix:
A tabular display (2X2 in the binary case) of the record counts by their predicted and actual classification status

1

0

1

0

Слайд 7

Model Evaluation: Titanic Dataset Confusion Matrix:

Model Evaluation: Titanic Dataset

 

 

Confusion Matrix:

Слайд 8

Model Evaluation: Titanic Dataset Confusion Matrix:

 

Model Evaluation: Titanic Dataset

Confusion Matrix:

Слайд 9

Exercises Assign first 800 of the Global Findex data to train

Exercises

Assign first 800 of the Global Findex data to train data

and the remaining 200 of them to test data.
Fit a multiple logit model to predict whether the person saved in the past 12 months (1) or not saved (0) based on age, gender, education level, and employment status.
Compute the odds ratio and probability for a female, 30 years old, with secondary education, and employed. Would you classify this person as saved or not saved? Use p > 0.5 as cutoff point to classify as saved (if p > 0.5, then Saved = 1).
Evaluate your model performance on test data using accuracy, sensitivity, specificity and precision ratios. Use p > 0.5 as cutoff point to classify as saved (if p > 0.5, then Saved = 1).
Fit another multiple logit model to predict whether the person saved (1) or not saved (0) using all other variables as predictor variables.
Evaluate your model performance on test data using accuracy, sensitivity, specificity and precision ratios. Use p > 0.5 as cutoff point to classify as saved (if p > 0.5, then Saved = 1).
Слайд 10

Exercise solutions The fitted model is given here: Confusion Matrix:

Exercise solutions

The fitted model is given here:

 

Confusion Matrix:

 

 

 

Слайд 11

Exercise solutions Confusion Matrix:

Exercise solutions

 

 

 

Confusion Matrix:

Слайд 12

CONGRATS!!! Thank you for your attention! Congratulations!!! You mastered the “Introduction to Statistics and Data Science”

CONGRATS!!!
Thank you for your attention!
Congratulations!!!
You mastered
the “Introduction to Statistics and

Data Science”