Descriptive statistics. Elementary statistics. Larson. Farber. (Chapter 2)

Содержание

Слайд 2

Frequency Distributions 102 124 108 86 103 82 71 104 112

Frequency Distributions

102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105

97 107 67 78 125
109 99 105 99 101 92

Make a frequency distribution table with five classes.

Minutes Spent on the Phone

Key values:

Minimum value =
Maximum value =

67

125

Слайд 3

Decide on the number of classes (For this problem use 5)

Decide on the number of classes (For this problem use 5)


Calculate the Class Width
(125 - 67) / 5 = 11.6 Round up to 12
Determine Class Limits
Mark a tally in appropriate class for each data value

Frequency Distributions

78
90
102
114
126

3
5
8
9
5

67
79
91
103
115

Do all lower class limits first.

Слайд 4

67 - 78 79 - 90 91 - 102 103 -114

67 - 78
79 - 90
91 - 102
103 -114
115

-126

3
5
8
9
5

Midpoint: (lower limit + upper limit) / 2

Relative frequency: class frequency/total frequency

Cumulative frequency: Number of values in that class or in
lower one.

Other Information

Midpoint

Relative
frequency

Cumulative
frequency

72.5
84.5
96.5
108.5
120.5

0.10
0.17
0.27
0.30
0.17

3
8
16
25
30

Слайд 5

Boundaries 66.5 - 78.5 78.5 - 90.5 90.5 - 102.5 102.5

Boundaries
66.5 - 78.5
78.5 - 90.5
90.5 - 102.5
102.5

-114.5
115.5 -126.5

Frequency Histogram

Time on Phone

minutes

f

Слайд 6

Frequency Polygon Time on Phone minutes f Mark the midpoint at

Frequency Polygon

Time on Phone

minutes

f

Mark the midpoint at the top of

each bar. Connect consecutive midpoints. Extend the frequency polygon to the axis.
Слайд 7

Relative Frequency Histogram Time on Phone minutes Relative frequency Relative frequency on vertical scale

Relative Frequency Histogram

Time on Phone

minutes

Relative frequency

Relative frequency on vertical scale

Слайд 8

Ogive An ogive reports the number of values in the data

Ogive

An ogive reports the number of values in the data set

that
are less than or equal to the given value, x.
Слайд 9

Stem-and-Leaf Plot 6 | 7 | 8 | 9 | 10|

Stem-and-Leaf Plot

6 |
7 |
8 |
9 |
10|
11|
12|

Stem

Leaf

Lowest value is 67

and highest value is 125, so list stems from 6 to 12.

102 124 108 86 103 82

2

4

8

6

3

2

Слайд 10

Stem-and-Leaf Plot 6 |7 7 |1 8 8 |2 5 6

Stem-and-Leaf Plot

6 |7
7 |1 8
8 |2 5 6

7 7
9 |2 5 7 9 9
10 |0 1 2 3 3 4 5 5 7 8 9
11 |2 6 8
12 |2 4 5

Key: 6 | 7 means 67

Слайд 11

Stem-and-Leaf with two lines per stem 6 | 7 7 |

Stem-and-Leaf with two lines per stem

6 | 7
7 |

1
7 | 8
8 | 2
8 | 5 6 7 7
9 | 2
9 | 5 7 9 9
10 | 0 1 2 3 3 4
10 | 5 5 7 8 9
11 | 2
11 | 6 8
12 |2 4
12 | 5

Key: 6 | 7 means 67

1st line digits 0 1 2 3 4

2nd line digits 5 6 7 8 9

1st line digits 0 1 2 3 4

2nd line digits 5 6 7 8 9

Слайд 12

Dotplot 66 76 86 96 106 116 126 Phone minutes

Dotplot

66

76

86

96

106

116

126

Phone

minutes

Слайд 13

The 1995 NASA budget (billions of $) divided among 3 categories.

The 1995 NASA budget (billions of $)
divided among 3 categories.

Pie

Chart

Used to describe parts of a whole
Central Angle for each segment

Construct a pie chart for the data.

Слайд 14

Pie Chart 5.7/14.3*360o = 143o 5.9/14.3*360o = 149o

Pie Chart

5.7/14.3*360o = 143o

5.9/14.3*360o = 149o

Слайд 15

Measures of Central Tendency Mean: The sum of all data values

Measures of Central Tendency

Mean: The sum of all data values divided

by the number of values
For a population: For a sample:

Median: The point at which an equal number of values fall above and fall below

Mode: The value with the highest frequency

Слайд 16

2 4 2 0 40 2 4 3 6 Calculate the

2 4 2 0 40 2 4 3 6

Calculate the mean,

the median, and the mode

n = 9

Mean:

Median: Sort data in order

0 2 2 2 3 4 4 6 40

The middle value is 3, so the median is 3.

Mode: The mode is 2 since it occurs the most times.

An instructor recorded the average number of absences for his students in one semester. For a random sample the data are:

Слайд 17

2 4 2 0 2 4 3 6 Calculate the mean,

2 4 2 0 2 4 3 6

Calculate the mean, the

median, and the mode

n =8

Mean:

Median: Sort data in order

The middle values are 2 and 3, so the median is 2.5

Mode: The mode is 2 since it occurs the most.

Suppose the student with 40 absences is dropped from the course. Calculate the mean, median and mode of the remaining values. Compare the effect of the change to each type of average.

0 2 2 2 3 4 4 6

Слайд 18

Shapes of Distributions Uniform Symmetric Skewed right Skewed left Mean > median Mean Mean = median

Shapes of Distributions

Uniform

Symmetric

Skewed right

Skewed left

Mean > median

Mean < median

Mean = median

Слайд 19

Descriptive Statistics Closing prices for two stocks were recorded on ten

Descriptive Statistics

Closing prices for two stocks were recorded on ten successive

Fridays. Calculate the mean, median and mode for each.

Mean = 61.5
Median =62
Mode= 67

Mean = 61.5
Median =62
Mode= 67

56 33
56 42
57 48
58 52
61 57
63 67
63 67
67 77
67 82
67 90

Stock A

Stock B

Слайд 20

Range for A = 67 - 56 = $11 Range =

Range for A = 67 - 56 = $11

Range = Maximum

value - Minimum value

Range for B = 90 - 33 = $57

The range only uses 2 numbers from a data set.

The deviation for each value x is the difference between the value of x and the mean of the data set.

In a population, the deviation for each value x is:x - μ

In a sample, the deviation for each value x is:

Measures of Variation

Слайд 21

-5.5 -5.5 -4.5 -3.5 -0.5 1.5 1.5 5.5 5.5 5.5 56

-5.5
-5.5
-4.5
-3.5
-0.5
1.5


1.5
5.5
5.5
5.5

56
56
57
58
61
63
63
67 67 67

Deviations

µ = 61.5

56 - 61.5

56 - 61.5

57 - 61.5

58 - 61.5

∑ ( x - µ) = 0

Stock A

Deviation

The sum of the deviations is always zero.

Слайд 22

Population Variance: The sum of the squares of the deviations, divided

Population Variance: The sum of the squares of the deviations, divided

by N.

Stock A
56 -5.5 30.25
56 -5.5 30.25
57 -4.5 20.25
58 -3.5 12.25
61 -0.5 0.25
63 1.5 2.25
63 1.5 2.25
67 5.5 30.25
67 5.5 30.25
67 5.5 30.25

188.50

Sum of squares

Population Variance

Слайд 23

Population Standard Deviation Population Standard Deviation The square root of the

Population Standard Deviation

Population Standard Deviation The square root of the

population variance.

The population standard deviation is $4.34

Слайд 24

Calculate the measures of variation for Stock B Sample Standard Deviation

Calculate the measures of variation for Stock B

Sample Standard Deviation

To

calculate a sample variance divide the sum of squares by n-1.

The sample standard deviation, s is found by taking the
square root of the sample variance.

Слайд 25

Summary Population Standard Deviation Sample Variance Sample Standard Deviation Range =

Summary

Population Standard Deviation

Sample Variance

Sample Standard Deviation

Range = Maximum value - Minimum

value

Population Variance

Слайд 26

Empiricl Rule 68- 95- 99.7% rule Data with symmetric bell-shaped distribution

Empiricl Rule 68- 95- 99.7% rule

Data with symmetric bell-shaped distribution has

the following characteristics.

About 68% of the data lies within 1 standard deviation of the mean

About 99.7% of the data lies within 3 standard deviations of the mean

About 95% of the data lies within 2 standard deviations of the mean

68%

Слайд 27

Using the Empirical Rule The mean value of homes on a

Using the Empirical Rule

The mean value of homes on a street

is $125 thousand with a standard deviation of $5 thousand. The data set has a bell shaped distribution. Estimate the percent of homes between $120 and $135 thousand

68%

68%

$120 is 1 standard deviation below the mean and $135 thousand is 2 standard deviation above the mean.

68% + 13.5% = 81.5%

So, 81.5% of the homes have a value between $120 and $135 thousand .

68%

Слайд 28

Chebychev’s Theorem For k = 3, at least 1-1/9 = 8/9=

Chebychev’s Theorem

For k = 3, at least 1-1/9 = 8/9= 88.9%

of the data lies within 3 standard deviation of the mean.

For any distribution regardless of shape the portion of data lying within k standard deviations (k >1) of the mean is at least 1 - 1/k2.

μ =6
σ =3.84

For k = 2, at least 1-1/4 = 3/4 or 75% of the data lies within 2 standard deviation of the mean.

Слайд 29

Chebychev’s Theorem The mean time in a women’s 400-meter dash is

Chebychev’s Theorem

The mean time in a women’s 400-meter dash is 52.4

seconds with a standard deviation of 2.2 sec. Apply Chebychev’s theorem for k = 2.

52.4

54.6

56.8

59

50.2

48

45.8

2 standard deviations

At least 75% of the women’s 400- meter dash times will fall between 48 and 56.8 seconds.

Mark a number line in standard deviation units.

Слайд 30

Grouped Data 30 Class f Midpoint (x) To approximate the mean

Grouped Data

30

Class

f

Midpoint (x)

To approximate the mean of data in

a frequency distribution, treat each value as if it occurs at the midpoint
of its class. x = Class midpoint.

x f

2991

Слайд 31

Grouped Data To approximate the standard deviation of data in a

Grouped Data

To approximate the standard deviation of data
in a frequency

distribution,
use x = class midpoint.

739.84

2219.52

231.04

1155.20

10.24

81.92

77.44

696.96

432.64

2163.2

30

6316.8

Слайд 32

Quartiles You are managing a store. The average sale for each

Quartiles

You are managing a store. The average sale for each of

27 randomly selected days in the last year is given. Find Q1, Q2 and Q3..
28 43 48 51 43 30 55 44 48 33 45 37 37 42 27 47 42 23 46 39 20 45 38 19 17 35 45

3 quartiles Q1, Q2 and Q3 divide the data into 4 equal parts.
Q2 is the same as the median.
Q1 is the median of the data below Q2
Q3 is the median of the data above Q2

Слайд 33

The data in ranked order (n = 27) are: 17 19

The data in ranked order (n = 27) are:
17 19 20

23 27 28 30 33 35 37 37 38 39 42 42
43 43 44 45 45 45 46 47 48 48 51 55 .

Quartiles

Median rank (27 +1)/2 = 14. The median = Q2 = 42.

There are 13 values below the median.
Q1 rank= 7. Q1 is 30.
Q3 is rank 7 counting from the last value. Q3 is 45.

The Interquartile Range is Q3 - Q1 = 45 - 30 = 15

Слайд 34

Box and Whisker Plot A box and whisker plot uses 5

Box and Whisker Plot

A box and whisker plot uses 5 key

values to describe a set of data. Q1, Q2 and Q3, the minimum value and the maximum value.

Q1
Q2 = the median
Q3
Minimum value
Maximum value

30
42
45
17
55

Interquartile Range

Слайд 35

Percentiles Percentiles divide the data into 100 parts. There are 99

Percentiles

Percentiles divide the data into 100 parts. There are 99 percentiles:

P1, P2, P3…P99 .

A 63nd percentile score indicates that score is greater than or equal to 63% of the scores and less than or equal to 37% of the scores.

P50 = Q2 = the median

P25 = Q1

P75 = Q3