Linear Regression Models. Week 7

Содержание

Слайд 2

Learning Outcomes Distinguish between deterministic and probabilistic relations. Understand the concepts

Learning Outcomes
Distinguish between deterministic and probabilistic relations.
Understand the concepts of correlation

and regression.
Be able to fit linear models.
Understand the method of least squares.
Interpret regression coefficients.
Слайд 3

Deterministic (Functional) Relationship A functional relation between two variables is expressed

Deterministic (Functional) Relationship

A functional relation between two variables is expressed by

a mathematical formula. If X denotes the independent variable and Y the dependent variable, a functional relation is of the form: Y = f(X)
Слайд 4

Probabilistic (Statistical) Relationship A probabilistic relationship, unlike a deterministic one, is

Probabilistic (Statistical) Relationship

A probabilistic relationship, unlike a deterministic one, is

not a perfect one. In general, the observations for a probabilistic relation do not fall directly on-the curve of relationship.
Слайд 5

Correlation Coefficient

Correlation Coefficient

 

Слайд 6

Correlation is not causation

Correlation is not causation

Слайд 7

Historical Origins of Regression Regression analysis was first developed by Sir

Historical Origins of Regression

Regression analysis was first developed by Sir Francis

Galton in the latter part of the 19th century. Galton had studied the relation between heights of parents and children and noted that the heights of children of both tall and short parents appeared to "revert" or "regress" to the mean of the group. He considered this tendency to be a regression to "mediocrity." Galton developed a mathematical description of this regression tendency, the precursor of today's regression models.
The term regression persists to this day to describe statistical relations between variables.
Слайд 8

Simple Linear Regression Simple linear regression is a statistical method that

Simple Linear Regression

Simple linear regression is a statistical method that allows us

to summarize and study relationships between two quantitative variables:
One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
It is called “simple”, because there is only one predictor variable.
It is called “linear”, because no parameter appears as an exponent or is multiplied or divided by another parameter.
Слайд 9

Simple Linear Regression Basic simple linear model: Yi = β0 +

Simple Linear Regression

Basic simple linear model:
Yi = β0 + β1Xi +

εI
 Yi is the value of the response variable in the ith trial.
β0 (intercept) and β1 (slope) are parameters of the regression.
Xi is a known constant, namely, the value of the predictor variable in the ith trial.
εi is a random error term with mean E{εi} = 0 and variance σ2{εi} = σ2; Error terms are uncorrelated with each other.
Expected value of Y: E(Y) = β0 + β1*X
When scope of the model includes X=0, β0 gives mean of Y at X=0. When the scope of the model does not cover X=0, β0 does not have any particular meaning as a separate term in the model.
Слайд 10

Method of Least Squares

Method of Least Squares

 

Слайд 11

Computing the least-squares regression line.

Computing the least-squares regression line.

 

Слайд 12

Example

Example

 

Слайд 13

Results from R R Syntax: lm.model summary(lm.model)

Results from R

 

R Syntax:
lm.model <- lm(price ~ size, data = housing)
summary(lm.model)

Слайд 14

Interpretation of b0 and b1

 

Interpretation of b0 and b1

Слайд 15

Multiple Linear Regression Multiple linear regression is an extension of simple

Multiple Linear Regression

Multiple linear regression is an extension of simple linear regression used to

predict an outcome variable (y) on the basis of multiple distinct predictor variables (x).

y – dependent (response, outcome) variable.
x1, x2, …, xk - independent (predictor, explanatory) variables.
β0 – intercept.
β1, β2, …, βk – partial slopes or partial regression coefficients.
ε – error terms.

Слайд 16

Example

Example

 

Слайд 17

Interpretations For a given predictor variable, the coefficient (b) can be

Interpretations

For a given predictor variable, the coefficient (b) can be interpreted

as the average effect on y of a one unit increase in predictor, holding all other predictors fixed.
For example, for a fixed amount of youtube and newspaper advertising budget, spending an additional 1,000 dollars on facebook advertising leads to an increase in sales by approximately 0.1885*1,000 = 189 sale units, on average.
The youtube coefficient suggests that for every 1,000 dollars increase in youtube advertising budget, holding all other predictors constant, we can expect an increase of 0.046*1,000 = 46 sales units, on average.
Слайд 18

Qualitative (Categorical) Variables Categorical independent variables can be incorporated into a

Qualitative (Categorical) Variables

Categorical independent variables can be incorporated into a regression

model by converting them into 0/1 (“dummy”) variables
Use k – 1 = 3 – 1 = 2 dummy variables to code this information like this:

In R, categorical variables need to be converted into factor before running the lm function. R will automatically keep the first factorial level as reference group.

Слайд 19

Example

Example

 

Слайд 20

Output and Interpretations If the quality of shelving is Good, ceteris

Output and Interpretations

If the quality of shelving is Good, ceteris paribus*,

average amount of sales increases by 4.751*1000 = 4751 compared to the Bad shelving.

If the quality of shelving is Medium, while holding other variables constant, average amount of sales increases by 1.862*1000 = 1862 compared to the Bad shelving.

*ceteris paribus = all other things being equal

If the competitor increases its price by $1, while holding other variables constant, average carseat sales increases by 0.010*1000 = 10

Слайд 21

Fitted Models

Fitted Models

 

Слайд 22

Literature Lind et al. Basic Statistics for Business and Economics. Chapter

Literature

Lind et al. Basic Statistics for Business and Economics. Chapter

13.
Holmes et al. Introductory Business Statistics. Chapter 13.
McClave, Sincich. Statistics. Chapter 11 & 12.
http://www.sthda.com/english/articles/40-regression-analysis/168-multiple-linear-regression-in-r/
Слайд 23

Practice Exercises Refer to Carseats data. Compute the correlation coefficient between

Practice Exercises

Refer to Carseats data. Compute the correlation coefficient between competitor’s

price and the company’s price. Comment on their relationships.
Fit a simple linear model to predict the carseat price using the competitor’s price as an independent variable.
What is the expected price if the competitor’s price is $120?
3. Fit a multiple linear model to predict the sales using Income, Advertising and Urban as predictor variables.
Use the stores in urban locations as reference group.
What is the predicted sales if Community Income level is $75,000, Local advertising budget is $10,000 and the store is located in rural area?
4. Interpret your estimated regression coefficients in task 3.
5. (Homework).
Fit a multiple linear model to predict the sales using Income, Age, Education level and US as predictor variables. Provide the fitted model for US and non-US locations.