Logistic Regression Model. Week 10

Слайд 2

Lecture outline Odds Ratio Simple Logistic (Logit) Regression Multiple Logistic (Logit) Regression

Lecture outline

Odds Ratio
Simple Logistic (Logit) Regression
Multiple Logistic (Logit) Regression

Слайд 3

 

Слайд 4

Linear Regression Model Yi = β0 + β1Xi + εi If

Linear Regression Model

Yi = β0 + β1Xi + εi
If the OLS

(ordinary least squares) model is used when there is a binary dependent variable, the following problems might arise:
The error terms are heteroskedastic.
e is not normally distributed because Y takes on only two values;
The predicted probabilities can be greater than 1 or less than 0.
Слайд 5

Linear vs Logit Model Source: https://www.datacamp.com/

Linear vs Logit Model

Source: https://www.datacamp.com/

Слайд 6

Simple Logistic Regression Model Logit form Probability form Note: This is a natural log (aka “ln”)

Simple Logistic Regression Model

 

Logit form

Probability form

Note: This is a natural log

(aka “ln”)
Слайд 7

 

Слайд 8

Estimation of coefficients Maximum Likelihood Estimation (MLE) is a statistical method

Estimation of coefficients

Maximum Likelihood Estimation (MLE) is a statistical method for

estimating the coefficients of a logistic model.
Likelihood function:
The estimates b0  and b1  are chosen to maximize  this likelihood function.
Learn more about MLE:
https://medium.com/codex/logistic-regression-and-maximum-likelihood-estimation-function-5d8d998245f9
Слайд 9

Example: Titanic Dataset R Code: titanic logit.m summary(logit.m)

Example: Titanic Dataset

 

R Code:
titanic <- read.csv("titanic.csv")
logit.m <- glm(Survived ~ Fare, data

= titanic, family = "binomial")
summary(logit.m)
Слайд 10

Fitted simple logit model

Fitted simple logit model

 

 

Слайд 11

Multiple Logistic Regression

Multiple Logistic Regression

 

Слайд 12

R Output:

R Output:

 

 

Слайд 13

Exercises The Global Findex database is the world’s most comprehensive data

Exercises

The Global Findex database is the world’s most comprehensive data set

on how adults save, borrow, make payments, and manage risk. The data are collected in partnership with Gallup, Inc., through nationally representative surveys of more than 150,000 adults in over 140 economies. ”findex_uzb.csv” provides some of the variables from Findex dataset of Uzbekistan for 2017. Responses are recorded as 1 if “yes”, 0 if “no”.
There are total of 1000 observations in the dataset.
Fit a simple logit model to predict whether the person saved (1) or not saved (0) based on age.
Comment on the significance of Age.
b) Predict the probability of saving at a financial institution for a person aged 45 years old.
c) Fit a multiple logit model to predict whether the person saved (1) or not saved (0) based on age, gender, education level, and employment status.
d) Compute the odds ratio and probability for a female, 30 years old, with secondary education, and employed.
Слайд 14

Exercises

Exercises

 

Слайд 15

References James G. et al. An Introduction to Statistical Learning (with

References

James G. et al. An Introduction to Statistical Learning (with Applications

in R).
ISBN: 978-1-4614-7137-0. p.p. 156 – 160.
2. https://www.datacamp.com/community/tutorials/logistic-regression-R
3. https://machinelearningmastery.com/logistic-regression-for-machine-learning/