Using numerical measures to describe data. Measures of the center. Week 3 (2)

Содержание

Слайд 2

Using numerical measures to describe data «Is the data in the

Using numerical measures to describe data
«Is the data in the

sample centered or located around a specific value?»
First question that business people, economists, corporate executives, etc. ask when presented with sample data.
Слайд 3

Using numerical measures to describe data The histogram gives an idea

Using numerical measures to describe data
The histogram gives an

idea whether the data is centered around a specific value.
The histogram provides a visual picture of how the data is distributed (symmetric, skewed, etc.)
Слайд 4

Is the data centered around a specific value?

Is the data centered around a specific value?

Слайд 5

Numerical measures to describe data COPYRIGHT © 2013 PEARSON EDUCATION, INC.

Numerical measures to describe data

COPYRIGHT © 2013 PEARSON EDUCATION,

INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Central Tendency

Variation

Слайд 6

Measures of the center of the data set COPYRIGHT © 2013

Measures of the center of the data set

COPYRIGHT © 2013

PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Measures of Central Tendency

Mean

Median

Mode

Midpoint of ranked/ordered values in the data

Most frequently observed value in the data
(if one exists)

Arithmetic average of the data

2.1

Слайд 7

The mean is the most common measure of the center of

 
The mean is the most common measure of the center of

a data set
For a population of N values:

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Population size

Population values

Слайд 8

For a sample of n values: COPYRIGHT © 2013 PEARSON EDUCATION,

 
For a sample of n values:

COPYRIGHT © 2013 PEARSON EDUCATION, INC.

PUBLISHING AS PRENTICE HALL

Ch. 2-

Sample size

Observed values

Слайд 9

The Mean symmetry and unimodal distribution WHEN WE HAVE A SYMMETRIC

The Mean symmetry and unimodal distribution

WHEN WE HAVE A SYMMETRIC

DISTRIBUTION WITH ONE MODE, THEN THE MEAN REPRESENTS THE MIDDLE VALUE IN A DATA SET.

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Слайд 10

Mean The most common measure for the center of a data

Mean

The most common measure for the center of a data

set
Affected by extreme values (outliers)

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

Слайд 11

Mean The most common measure for the center of a data

Mean

The most common measure for the center of a data

set
Affected by extreme values (outliers)

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

Слайд 12

Skewed distribution An outlier will distort the picture of the data.

Skewed distribution
An outlier will distort the picture of the

data.
It will inflate or deflate the mean, depending
on the value of the outlier
This creates a skewed distribution.
In this case we may want to use a different measure of the data center

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Слайд 13

Median In an ordered list of data, the median is the

Median

In an ordered list of data, the median is the

“middle” number (50% above, 50% below)
Not affected by outliers

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Слайд 14

Finding the Median The location of the median: If the number

Finding the Median
The location of the median:
If the number of

values is odd (uneven), the median is the middle number
- 17 6 25 -5 13 9 33
For this data set: -17 -5 6 9 13 25 33

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Слайд 15

Finding the Median The location of the median: If the number

Finding the Median
The location of the median:
If the number of

values is even, the median is the two middle numbers divided by 2

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Слайд 16

Finding the median Determine the median of the following data set:

Finding the median
Determine the median of the following data set:
17

5 3 11 12 8 25 3
Слайд 17

Finding the median Determine the median of the following data set:

Finding the median
Determine the median of the following data set:
17

5 3 11 12 8 25 3
3 3 5 8 11 12 17 25
Median: 8 +11 = 19/ 2 = 9.5
Слайд 18

Mode Value that occurs most often in the data set Not

Mode
Value that occurs most often in the data set
Not affected by

outliers
Used for either numerical or categorical data
There may be no mode
There may be several modes, uni-modal, bi-modal, multimodal

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Слайд 19

Measures of the center summary data COPYRIGHT © 2013 PEARSON EDUCATION,

Measures of the center summary data

COPYRIGHT © 2013 PEARSON EDUCATION, INC.

PUBLISHING AS PRENTICE HALL

Ch. 2-

Five houses on a hill by the beach

House Prices: $2,000,000 500,000 300,000 100,000 100,000

Слайд 20

Measures of the center summary data COPYRIGHT © 2013 PEARSON EDUCATION,

Measures of the center summary data

COPYRIGHT © 2013 PEARSON EDUCATION, INC.

PUBLISHING AS PRENTICE HALL

Ch. 2-

What is the mean house price?
What is the median house price?
What is the modal house price?

Слайд 21

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL Ch.


COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch.

2-

Mean: ($3,000,000/5)
= $600,000
Median: middle value of ranked data = $300,000
Mode: most frequent house price = $100,000

House Prices: $2,000,000
500,000 300,000 100,000 100,000
Sum 3,000,000

Measures of the center - summary

Слайд 22

When is which measure of the center the “best”? COPYRIGHT ©

When is which measure of the center the “best”?

COPYRIGHT ©

2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-
Mean is generally used, unless outliers exist. If there are outliers the mean does not represent the center well.
Then median is used when outliers exist in the data set.
Example: Median home prices may be reported for a region – less sensitive to outliers

Слайд 23

Shape of a Distribution Describe the shape of a distribution Describes

Shape of a Distribution Describe the shape of a distribution

Describes how

data is distributed
The presence or not of outliers in a data set, influence the shape of a distribution
Symmetric or skewed

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Mean = Median=Mode


Mean < Median

Median < Mean

Right-Skewed

Left-Skewed

Symmetric

Слайд 24

Histogram of annual salaries (in $) for a sample of U.S.

Histogram of annual salaries (in $) for a sample of U.S.

marketing managers:
Describe the shape of this histogram (of the distribution)
Without doing calculations. Do you expect the mean salary to be higher or lower than the median salary?
Слайд 25

Class exercise Eleven economists were asked to predict the percentage growth

Class exercise
Eleven economists were asked to predict the percentage growth in

the Consumer Price Index over the next year.
Their forecasts were as follows:
3.6 3.1 3.9 3.7 3.5 1.0 3.7 3.4 3.0 3.7 3.4
Compute the mean, median and the mode
Are there any outliers in the data set that may influence the value of the mean?
If there are outliers, how do they affect the shape of the data distribution?
Слайд 26

Solution to class exercise Mean: 36/11 = 3.27 rounded up to

Solution to class exercise

Mean: 36/11 = 3.27 rounded up to

3.3
Median: 3.5
Mode: 3.7
Outlier: 1.0
How does the outlier affect the shape of the distribution?
It decreases the average of the data set and distorts the picture of the histogram.
The shape is skewed to the left.
Слайд 27

Measures of variability The three measures of data center do not

Measures of variability
The three measures of data center do not provide

complete and sufficient description of the data.
Next to knowing how data is located around a specific value (mean, median or mode), we need information on how far the data is spread from that specific value, most often from the mean.
The measure of variability will provide us with this information.

DR SUSANNE HANSEN SARAL

Слайд 28

Measures of Variability DR SUSANNE HANSEN SARAL Same center, different variation

Measures of Variability

DR SUSANNE HANSEN SARAL

Same center,
different variation

Variation

Variance

Standard Deviation

Coefficient of

Variation

Range

Interquartile
Range

Measures of variation give information about the spread or variability of the data values.

Слайд 29

Quartiles DR SUSANNE HANSEN SARAL

Quartiles

 

DR SUSANNE HANSEN SARAL

Слайд 30

Quartiles DR SUSANNE HANSEN SARAL 25% 25% 25% 25% Q1 Q2 Q3

Quartiles

DR SUSANNE HANSEN SARAL

25%

25%

25%

25%

 

Q1

Q2

Q3

Слайд 31

How to calculate quartiles manually DR SUSANNE HANSEN SARAL Find a

How to calculate quartiles manually

DR SUSANNE HANSEN SARAL

Find a quartile by

determining the value in the appropriate position of the ranked data, where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1)
(the median position)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values
Слайд 32

Quartiles DR SUSANNE HANSEN SARAL (n = 9 1st Quartile =

Quartiles

DR SUSANNE HANSEN SARAL

(n = 9
1st Quartile = the

value located in the 0.25(n+1)th ordered position
1st Quartile = value located in the 0.25(9+1)th ordered position
1st Quartile = value located in the 2.5th position
The value in the 2nd position is 12 and the value in the 3rd position is 14. The value in the 2.5th position is 50 % of the distance between 12 and 14. The value of the first quartile therefore: 12 + 0.5(14-12) = 13

Sample Ranked Data: 11 12 14 16 16 17 18 21 22

Example: Find the first and third quartile
14 12 16 21 11 17 22 16 18

Q1 = 0.25(n+1)

Слайд 33

Quartiles DR SUSANNE HANSEN SARAL Sample Ranked Data: 11 12 14

Quartiles

DR SUSANNE HANSEN SARAL

 

Sample Ranked Data: 11 12 14 16 16

17 18 21 22

Example: Find the first and third quartile

Слайд 34

Quartiles and Enron case

Quartiles and Enron case

 

Слайд 35

Range Simplest measure of variation Difference between the largest and the

Range
Simplest measure of variation
Difference between the largest and the smallest observations:

COPYRIGHT

© 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Слайд 36

Range – Example Enron case Range = Maximum value – minimum

Range – Example Enron case
Range = Maximum value – minimum

value
Enron data range = $21.06 – (-$17.75) = $ 38.81
Слайд 37

Disadvantages of the Range Ignores the way in which data is

Disadvantages of the Range
Ignores the way in which data is

distributed

DR SUSANNE HANSEN SARAL

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5



Слайд 38

Disadvantages of the Range Sensitive to outliers DR SUSANNE HANSEN SARAL

Disadvantages of the Range
Sensitive to outliers

DR SUSANNE HANSEN SARAL

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range =

5 - 1 = 4

Range = 120 - 1 = 119

Слайд 39

Range: short-comings as a good measure for variability Because the range

Range: short-comings as a good measure for variability
Because the range

does not provide us with a lot of information about the spread of the data it is not a very good measure for variability.

2/16/2017

Слайд 40

Interquartile Range COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE

Interquartile Range

 

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch.

2-

25 %

25 %

25 %

25%

Слайд 41

Interquartile Range The interquartile range (IQR) measures the spread of the

Interquartile Range

The interquartile range (IQR) measures the spread of the data

in the middle 50% of the data set
Defined as the difference between the observation at the third quartile and the observation at the first quartile
IQR = Q3 - Q1

COPYRIGHT © 2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL

Ch. 2-

Слайд 42

Interquartile Range Raw data: 6 8 10 12 14 9 11

Interquartile Range
Raw data: 6 8 10 12 14 9

11 7 13 11 n = 10
Ranked data: 6 7 8 9 10 11 11 12 13 14
1. Quartile: 7.75
3. Quartile: 12.25
IQR = Q3 – Q1 = 12.25 – 7.75 = 4.5
Q1: 7.75 Q3: 12.25

DR SUSANNE HANSEN SARAL

25 %

50 %

25 %