Correlation Regression

Содержание

Слайд 2

Causation

Causation

Слайд 3

Causation Causation is any cause that produces an effect. This means

Causation

Causation is any cause that produces an effect.
This means that when

something happens (cause) something else will also always happen(effect).
An example: When you run you burn calories.
As you can see with the example our cause is running while burning calories is our effect. This is something that is always, because that's how the human body works.
Слайд 4

Correlation Correlation measures the relationship between two things. Positive correlations happen

Correlation

Correlation measures the relationship between two things.
Positive correlations happen when one

thing goes up, and another thing goes up as well.
An example: When the demand for a product is high, the price may go up. As you can see, because the demand is high the price may be high.
Negative correlations occur when the opposite happens. When one thing goes up, and another goes down.
A correlation tells us that two variables are related, but we cannot say anything about whether one caused the other.
Слайд 5

Correlation Correlations happen when: A causes B B causes A A

Correlation

Correlations happen when:
A causes B
B causes A
A and B are consequences

of a common cause, but do not cause each other
There is no connection between A and B, the correlation is coincidental
Слайд 6

Causation and Correlation Causation and correlation can happen at the same

Causation and Correlation

Causation and correlation can happen at the same time.
But

having a correlation does not always mean you have a causation.
A good example of this:
There is a positive correlation between the number of firemen fighting a fire and the size of the fire. This means the more people at the fire, tends to reflect how big the fire is. However, this doesn’t mean that bringing more firemen will cause the size of the fire to increase.
Слайд 7

Correlation or Causation? As people’s happiness level increases, so does their

Correlation or Causation?

As people’s happiness level increases, so does their helpfulness.

This

would be a correlation.
Just because someone is happy does not always mean that they will become more helpful. This just usually tends to be the case.
Слайд 8

Correlation or Causation? Dogs pant to cool themselves down. This would

Correlation or Causation?

Dogs pant to cool themselves down.

This would be a

causation.
When a dog needs to cool itself down it will pant. This is not something that tends to happen, it is something that is always true.
Слайд 9

Correlation or Causation? Among babies, those who are held more tend

Correlation or Causation?

Among babies, those who are held more tend to

cry less.

This would be a correlation.
Just because a baby is held often does not mean that it will cry less. This just usually tends to be the case.

Слайд 10

Let's think of our own Correlation: Causation:

Let's think of our own

Correlation:

Causation:

Слайд 11

Quick Review Causation is any cause that produces an effect. Correlation

Quick Review

Causation is any cause that produces an effect.

Correlation measure the

relationship between two things.
Слайд 12

Correlation

Correlation

Слайд 13

The Question Are two variables related? Does one increase as the

The Question

Are two variables related?
Does one increase as the other increases?
e.

g. skills and income
Does one decrease as the other increases?
e. g. health problems and nutrition
How can we get a numerical measure of the degree of relationship?
Слайд 14

Scatterplots Graphically depicts the relationship between two variables in two dimensional space.

Scatterplots

Graphically depicts the relationship between two variables in two dimensional space.

Слайд 15

Direct Relationship

Direct Relationship

Слайд 16

Inverse Relationship

Inverse Relationship

Слайд 17

An Example Does smoking cigarettes increase systolic blood pressure? Plotting number

An Example

Does smoking cigarettes increase systolic blood pressure?
Plotting number of cigarettes

smoked per day against systolic blood pressure
Fairly moderate relationship
Relationship is positive
Слайд 18

Trend?

Trend?

Слайд 19

Smoking and BP Note relationship is moderate, but real. Why do

Smoking and BP

Note relationship is moderate, but real.
Why do we care

about relationship?
What would conclude if there were no relationship?
What if the relationship were near perfect?
What if the relationship were negative?
Слайд 20

Heart Disease and Cigarettes Data on heart disease and cigarette smoking

Heart Disease and Cigarettes

Data on heart disease and cigarette smoking in

21 developed countries Data have been rounded for computational convenience.
The results were not affected.
Слайд 21

The Data Surprisingly, the U.S. is the first country on the

The Data

Surprisingly, the U.S. is the first country on the list--the

country
with the highest consumption and highest mortality.
Слайд 22

Scatterplot of Heart Disease CHD Mortality goes on Y axis Why?

Scatterplot of Heart Disease

CHD Mortality goes on Y axis
Why?
Cigarette consumption on

X axis
Why?
What does each dot represent?
Best fitting line included for clarity
Слайд 23

{X = 6, Y = 11}

{X = 6, Y = 11}

Слайд 24

What Does the Scatterplot Show? As smoking increases, so does coronary

What Does the Scatterplot Show?

As smoking increases, so does coronary heart

disease mortality.
Relationship looks strong
Not all data points on line.
This gives us “residuals” or “errors of prediction”
To be discussed later
Слайд 25

Correlation Co-relation The relationship between two variables Measured with a correlation

Correlation

Co-relation
The relationship between two variables
Measured with a correlation coefficient
Most popularly seen

correlation coefficient: Pearson Product-Moment Correlation
Слайд 26

Types of Correlation Positive correlation High values of X tend to

Types of Correlation

Positive correlation
High values of X tend to be associated

with high values of Y.
As X increases, Y increases
Negative correlation
High values of X tend to be associated with low values of Y.
As X increases, Y decreases
No correlation
No consistent tendency for values on Y to increase or decrease as X increases
Слайд 27

Correlation Coefficient A measure of degree of relationship. Between 1 and

Correlation Coefficient

A measure of degree of relationship.
Between 1 and -1
Sign refers

to direction.
Based on covariance
Measure of degree to which large scores on X go with large scores on Y, and small scores on X go with small scores on Y
Слайд 28

Слайд 29

Covariance The formula for co-variance is: How this works, and why?

Covariance

The formula for co-variance is:
How this works, and why?
When would covXY

be large and positive? Large and negative?
Слайд 30

Example

Example

Слайд 31

Example What the heck is a covariance? I thought we were talking about correlation?

Example

What the heck is a covariance?
I thought we were talking

about correlation?
Слайд 32

Correlation Coefficient Pearson’s Product Moment Correlation Symbolized by r Covariance ÷

Correlation Coefficient

Pearson’s Product Moment Correlation
Symbolized by r
Covariance ÷ (product of the

2 SDs)
Correlation is a standardized covariance
Слайд 33

Calculation for Example CovXY = 11.12 sX = 2.33 sY = 6.69

Calculation for Example

CovXY = 11.12
sX = 2.33
sY = 6.69

Слайд 34

Example Correlation = .713 Sign is positive Why? If sign were

Example

Correlation = .713
Sign is positive
Why?
If sign were negative
What would it mean?
Would

not change the degree of relationship.
Слайд 35

Factors Affecting r Range restrictions Looking at only a small portion

Factors Affecting r

Range restrictions
Looking at only a small portion of the

total scatter plot (looking at a smaller portion of the scores’ variability) decreases r.
Reducing variability reduces r
Nonlinearity
The Pearson r measures the degree of linear relationship between two variables
If a strong non-linear relationship exists, r will provide a low, or at least inaccurate measure of the true relationship.
Слайд 36

Factors Affecting r Outliers Overestimate Correlation Underestimate Correlation

Factors Affecting r

Outliers
Overestimate Correlation
Underestimate Correlation

Слайд 37

Countries With Low Consumptions

Countries With Low Consumptions

Слайд 38

Outliers

Outliers

Слайд 39

Testing Correlations So you have a correlation. Now what? In terms

Testing Correlations

So you have a correlation. Now what?
In terms of magnitude,

how big is big?
Small correlations in large samples are “big.”
Large correlations in small samples aren’t always “big.”
Depends upon the magnitude of the correlation coefficient
AND
The size of your sample.
Слайд 40

Regression

Regression

Слайд 41

„Regression” refers to the process of fitting a simple line to

„Regression” refers to the process of fitting a simple line to

datapoints, Historically, linear regression was first used to explain the height of men by the height of their fathers.
Слайд 42

What is regression? How do we predict one variable from another?

What is regression?

How do we predict one variable from another?
How does

one variable change as the other changes?
Influence
Слайд 43

Linear Regression A technique we use to predict the most likely

Linear Regression

A technique we use to predict the most likely score

on one variable from those on another variable
Uses the nature of the relationship (i.e. correlation) between two variables to enhance your prediction
Слайд 44

Linear Regression: Parts Y - the variables you are predicting i.e.

Linear Regression: Parts

Y - the variables you are predicting
i.e. dependent variable
X

- the variables you are using to predict
i.e. independent variable
- your predictions (also known as Y’)
Слайд 45

Why Do We Care? We may want to make a prediction.

Why Do We Care?

We may want to make a prediction.
More likely,

we want to understand the relationship.
How fast does CHD mortality rise with a one unit increase in smoking?
Note: we speak about predicting, but often don’t actually predict.
Слайд 46

An Example Cigarettes and CHD Mortality again Data repeated on next

An Example

Cigarettes and CHD Mortality again
Data repeated on next slide
We want

to predict level of CHD mortality in a country averaging 10 cigarettes per day.
Слайд 47

The Data Based on the data we have what would we

The Data

Based on the data we have what would we predict

the rate of CHD be in a country that smoked 10 cigarettes on average?
First, we need to establish a prediction of CHD from smoking…
Слайд 48

For a country that smokes 6 C/A/D… We predict a CHD

For a country that smokes 6 C/A/D…

We predict a CHD rate

of about 14

Regression Line

Слайд 49

Regression Line Formula = the predicted value of Y (e.g. CHD

Regression Line

Formula
= the predicted value of Y (e.g. CHD mortality)
X

= the predictor variable (e.g. average cig./adult/country)
Слайд 50

Regression Coefficients “Coefficients” are a and b b = slope Change

Regression Coefficients

“Coefficients” are a and b
b = slope
Change in predicted

Y for one unit change in X
a = intercept
value of when X = 0
Слайд 51

Calculation Slope Intercept

Calculation

Slope
Intercept

Слайд 52

For Our Data CovXY = 11.12 s2X = 2.332 = 5.447

For Our Data

CovXY = 11.12
s2X = 2.332 = 5.447
b = 11.12/5.447

= 2.042
a = 14.524 - 2.042*5.952 = 2.32
Слайд 53

Note: The values we obtained are shown on printout. The intercept

Note:

The values we obtained are shown on printout.
The intercept is the

value in the B column labeled “constant”
The slope is the value in the B column labeled by name of predictor variable.
Слайд 54

Making a Prediction Second, once we know the relationship we can

Making a Prediction

Second, once we know the relationship we can predict
We

predict 22.77 people/10,000 in a country with an average of 10 C/A/D will die of CHD
Слайд 55

Accuracy of Prediction Finnish smokers smoke 6 C/A/D We predict: They

Accuracy of Prediction

Finnish smokers smoke 6 C/A/D
We predict:
They actually have 23

deaths/10,000
Our error (“residual”) =
23 - 14.619 = 8.38
a large error
Слайд 56

Cigarette Consumption per Adult per Day 12 10 8 6 4

Cigarette Consumption per Adult per Day

12

10

8

6

4

2

CHD Mortality per 10,000

30

20

10

0

Residual

Prediction

Слайд 57

Residuals When we predict Ŷ for a given X, we will

Residuals

When we predict Ŷ for a given X, we will sometimes

be in error.
Y – Ŷ for any X is a an error of estimate
Also known as: a residual
We want to Σ(Y- Ŷ) as small as possible.
BUT, there are infinitely many lines that can do this.
Just draw ANY line that goes through the mean of the X and Y values.
Minimize Errors of Estimate… How?
Слайд 58

Minimizing Residuals Again, the problem lies with this definition of the

Minimizing Residuals

Again, the problem lies with this definition of the mean:
So,

how do we get rid of the 0’s?
Square them.