Statistics. Data Description. Data Summarization. Numerical Measures of the Data

Сентябрь 14, 2022

Главная
Математика
Statistics. Data Description. Data Summarization. Numerical Measures of the Data

Содержание

2. Chapter Three: Numerical Measures of the Data Outline Introduction 3-1 Measures of Central Tendency 3-2 Measures
3. Chapter Three: Numerical Measures of the Data Objectives Summarize data using the measures of central tendency,
4. Chapter Three: Numerical Measures of the Data 3-1 Measures of Central tendency We will compute two
5. Chapter Three: Numerical Measures of the Data Example:- (Sample Mean) The ages of a random sample
6. Chapter Three: Numerical Measures of the Data Example:- population mean Statistics103110 3-
7. Chapter Three: Numerical Measures of the Data The Sample Mean for an Ungrouped Frequency Distribution Statistics103110
8. Chapter Three: Numerical Measures of the Data The Sample Mean for an Ungrouped Frequency Distribution –
9. Chapter Three: Numerical Measures of the Data The Sample Mean for a Grouped Frequency Distribution The
10. Important remark : In some situations the mean may not be representative of the data. As
11. Properties of the mean As stated, the mean is a widely used measure of central tendency
12. Chapter Three: Numerical Measures of the Data Median : The median splits the ordered data into
13. Chapter Three: Numerical Measures of the Data When there is an even number of values in
14. example Find the median grade of the following sample of students grades : A B A
15. Properties of the Median The major properties of the median are: The median is a unique
16. Chapter Three: Numerical Measures of the Data Mode:- is the score that occurs most frequently (denoted
17. Chapter Three: Numerical Measures of the Data Example:- Eleven different automobiles were tested at a speed
18. Chapter Three: Numerical Measures of the Data The Mode for a Grouped Frequency Distribution – Can
19. Properties of the Mode The mode can be found for all levels of data (nominal, ordinal,
20. Chapter Three: Numerical Measures of the Data The weighted mean is used when the values in
21. Chapter Three: Numerical Measures of the Data Example:- During a one hour period on a hot
22. Best measure of central tendency
23. Relationship between mean , median and mode and the shape of the distribution Symmetric – the
24. Chapter Three: Numerical Measures of the Data 3-2 Measures of Dispersion( variation) o the spread or
25. Variability -- provides a quantitative measure of the degree to which scores in a distribution are
26. Measures of dispersion are : The range , The interquartile range , The variance and standard
27. Example Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 The largest value
28. The variance of a variable The variance is based on the deviation from the mean (
29. Chapter Three: Numerical Measures of the Data The population variance of a variable is the sum
30. Properties of the variance and standard deviation it is the typical or approx. average distance from
31. Chapter Three: Numerical Measures of the Data The sample variance of a variable is the sum
32. Symbols for Standard Deviation Sample Population σ σ x xσn Book Some graphics calculators Some non-graphics
33. Chapter Three: Numerical Measures of the Data Sample Variance for Grouped and Ungrouped Data For grouped
34. Step one put the data I ungrouped frequency table Chapter Three: Numerical Measures of the Data
35. Example:- find the variance and SD for the frequency distribution of the data representing number of
36. Chapter Three: Numerical Measures of the Data Statistics103110 3-
37. Chapter Three: Numerical Measures of the Data Interpretation and Uses of the Standard Deviation The standard
38. Chapter Three: Numerical Measures of the Data Coefficient of Variation :- The relative measure of St.
39. Example : To see why the coefficient of variation should not be applied to interval level
40. Advantages The coefficient of variation is useful because the standard deviation of data must always be
41. Example:- Data about the annual salary (000’s) and age of CEO’s in a number of firms
42. Chapter Three: Numerical Measures of the Data Measure of position: Measures of position are used to
43. Chapter Three: Numerical Measures of the Data Standard Scores (or z-scores) specify the exact location of
44. Chapter Three: Numerical Measures of the Data Characteristics of Standard Scores The shape of the distribution
45. Chapter Three: Numerical Measures of the Data Example:- A student scored 65 on a statistics exam
46. Example:- a student scored 65 on a calculus test that had a mean of 50 and
47. Chapter Three: Numerical Measures of the Data Quartiles divide the data set into 4 groups. Quartiles
48. Chapter Three: Numerical Measures of the Data Example: For the following data set: 2, 3, 5,
49. Chapter Three: Numerical Measures of the Data Example: Find Q1 and Q3 for the following data
50. Chapter Three: Numerical Measures of the Data Example: For the following data set: 2, 3, 5,
51. Chapter Three: Numerical Measures of the Data The Q1 can be obtained graphically using the Ogive
52. Chapter Three: Numerical Measures of the Data The Q3 can be obtained graphically using the Ogive
53. Chapter Three: Numerical Measures of the Data The Interquartile Range (IQR) The Interquartile Range, IQR =
54. Chapter Three: Numerical Measures of the Data An outlier is an extremely high or an extremely
55. Example Given the data set 5, 6, 12, 13, 15, 18, 22, 50, can the value
56. Chapter Three: Numerical Measures of the Data Measure of Dispersion tells us about the variation of
57. Chapter Three: Numerical Measures of the Data For any bell shaped distribution: Approximately 68% of the
58. The Empirical (Normal) Rule μ ± 1σ = 68% μ ± 2σ = 95% μ ±
59. Chapter Three: Numerical Measures of the Data What is a Box Plot To construct a box
60. The box plot is useful in analyzing small data sets that do not lend themselves easily
61. How to use it: Collect and arrange data. Collect the data and arrange it into an
62. Obtain the minimum. This value will be the smallest data value that is greater than or
63. Example 1:- Failure times of industrial machines (in hours) 32.56 42.02 47.26 50.25 59.03 60.17 61.56
64. Chapter Three: Numerical Measures of the Data Statistics103110 3-
65. Chapter Three: Numerical Measures of the Data Now find the interquartile range (IQR). The interquartile range
66. Chapter Three: Numerical Measures of the Data Example 2 Consider two datasets: A1={0.22, -0.87, -2.39, -1.79,
67. Chapter Three: Numerical Measures of the Data Statistics103110 3-
68. Chapter Three: Numerical Measures of the Data Statistics103110 3-
70. Скачать презентацию

Слайд 2

Chapter Three: Numerical Measures of the Data
Outline
Introduction
3-1 Measures of Central Tendency
3-2

Measures of Variation
3-3 Measures of Position
3-4 Exploratory Data Analysis

Statistics103110

3-2

Слайд 3

Chapter Three: Numerical Measures of the Data
Objectives
Summarize data using the measures

of central tendency, such as the mean, median, mode, and midrange.
Describe data using the measures of variation, such as the range, variance, and standard deviation.
Identify the position of a data value in a data set using various measures of position, such as percentiles, and quartiles.
Use the techniques of exploratory data analysis, including stem and leaf plots, box plots, and five-number summaries to discover various aspects of data.

Statistics103110

3-3

Слайд 4

Chapter Three: Numerical Measures of the Data
3-1 Measures of Central tendency
We

will compute two means: one for the sample and one for a finite population of values.

The symbol represents the sample mean

Statistics103110

3-4

Слайд 5

Chapter Three: Numerical Measures of the Data
Example:- (Sample Mean)
The ages of

a random sample of seven students at a certain school are 11, 10, 12, 13, 7, 9, 15
Find the average (Mean) age of this sample

Statistics103110

Слайд 6

Chapter Three: Numerical Measures of the Data
Example:- population mean
Statistics103110
3-

Слайд 7

Chapter Three: Numerical Measures of the Data
The Sample Mean for an

Ungrouped Frequency Distribution

Statistics103110

Слайд 8

Chapter Three: Numerical Measures of the Data
The Sample Mean for an

Ungrouped Frequency Distribution –
Example

Statistics103110

Слайд 9

Chapter Three: Numerical Measures of the Data
The Sample Mean for a

Grouped Frequency Distribution
The mean for a grouped frequency distribution is given by :
Here is the corresponding class midpoint

Statistics103110

Слайд 10

Important remark :
In some situations the mean may not be representative

of the data.
As an example, the annual salaries of five vice presidents at AVX, LLC are $90,000, $92,000, $94,000, $98,000, and $350,000. The mean is:
Notice how the one extreme value ($350,000) pulled the mean upward. Four of the five vice presidents earned less than the mean, raising the question whether the arithmetic mean value of $144,800 is typical of the salary of the five vice presidents.

Слайд 11

Properties of the mean
As stated, the mean is a widely

used measure of central tendency . It has several important properties.
Every set of interval level and ratio level data has a mean.
All the data values are included in the calculation.
A set of data has only one mean, that is, the mean is unique.
The mean is a useful measure for comparing two or more populations.
The sum of the deviations of each value from the mean will always be zero, that is
The mean is highly affected by extreme data .
Note: Illustrating the fifth property
Consider the set of values: 3, 8, and 4. The mean is 5.

Слайд 12

Chapter Three: Numerical Measures of the Data
Median : The median splits

the ordered data into halves
the symbol used to denote the median is

Statistics103110

3-
Example:- The weights (in pounds) of seven army recruits are 180, 201, 220, 191, 219, 209, and 186. Find the median.
Arrange the data in order and select the middle point.
Data array: 180, 186, 191, 201, 209, 219, 220.
The median, = 201.
In the previous example, there was an odd number of values in the data set. In this case it is easy to select the middle number in the data array.

Слайд 13

Chapter Three: Numerical Measures of the Data
When there is an even

number of values in the data set, the median is obtained by taking the average of the two middle numbers.
Example:-
Six customers purchased the following number of magazines: 1, 7, 3, 2, 3, 4. Find the median.
Arrange the data in order and compute the middle point.
Data array: 1, 2, 3, 3, 4, 7.
The median, = (3 + 3)/2 = 3.
Example:-Find the median grade of the following sample
62, 68, 71, 74, 77, 82, 84, 88, 90, 94
62, 68, 71, 74, 77 82, 84, 88, 90, 94
5 on the left 5 on the right
= 79.5

Statistics103110

Слайд 14

example
Find the median grade of the following sample of students grades

:
A B A D F D F A B C C C F D A F D A A B B F D A B F C
Data array:
F F F F F F D D D D D C C C C B B B B B A A A A A A A
The median grade is : C
Half of the students had at least C ( a grade less than or equal C.
Half of the students had at most C ( a grade more than or equal C .
The median can be determined for ordinal level data .

Слайд 15

Properties of the Median
The major properties of the median are:
The median

is a unique value, that is, like the mean, there is only one median for a set of data.
It is not influenced by extremely large or small values and is therefore a valuable measure of central tendency when such values do occur.
It can be computed for ratio level, interval level, and ordinal-level data.
Fifty percent of the observations are greater than the median and fifty percent of the observations are less than the median.

Слайд 16

Chapter Three: Numerical Measures of the Data
Mode:- is the score that

occurs most frequently (denoted by M)
Example:- The following data represent the duration (in days) of U.S. space shuttle voyages for the years 1992-94. Find the mode.
Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11.
Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14. Mode = 8 days.
Example:- Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode.
Data set: 2, 3, 5, 7, 8, 10.
There is no mode. since each data value occurs equally with a frequency of one.

Statistics103110

Слайд 17

Chapter Three: Numerical Measures of the Data
Example:- Eleven different automobiles were

tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode.
Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26.
There are two modes (bimodal). The values are 18 and 24.

Statistics103110

Слайд 18

Chapter Three: Numerical Measures of the Data
The Mode for a Grouped

Frequency Distribution –
Can be approximated by the midpoint of the modal class.
Example

Modal
Class

Statistics103110

Слайд 19

Properties of the Mode
The mode can be found for all levels

of data (nominal, ordinal, interval, and ratio).
The mode is not affected by extremely high or low values.
A set of data can have more than one mode. If it has two modes, it is said to be bimodal.
A disadvantage is that a set of data may not have a mode because no value appears more than once.

Слайд 20

Chapter Three: Numerical Measures of the Data
The weighted mean is used

when the values in a data set are not all equally represented.
The weighted mean of a variable X is found by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights.

Statistics103110

Слайд 21

Chapter Three: Numerical Measures of the Data
Example:- During a one hour

period on a hot Saturday afternoon a boy served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of the of the price of the drinks :afternoon a boy served fifty

Statistics103110

Слайд 22

Best measure of central tendency

Слайд 23

Relationship between mean , median and mode and the shape of

the distribution

Symmetric – the mean =the median=the mode
Skewed left – the mean will usually be smaller than the median
Skewed right – the mean will usually be larger than the median

Dr.Nadia Ouakli

Слайд 24

Chapter Three: Numerical Measures of the Data
3-2 Measures of Dispersion( variation)
o

the spread or variability in the data.
Learning objectives
The range of a variable
The variance of a variable
The standard deviation of a variable
Use the Empirical Rule
Comparing two sets of data
The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data
The measures of dispersion in this section measure the differences between how far “spread out” the data values are.

Statistics103110

Слайд 25

Variability -- provides a quantitative measure of the degree to which

scores in a distribution are spread out or clustered together.
Tells how meaningful measures of central tendency are
Help to see which scores are outliers (extreme scores)
Why do we Study Dispersion?
A direct comparison of two sets of data based only on two measures of central tendency such as the mean and the median can be misleading since an average does not tell us anything about the spread of the data.
See Example 3-15 page 128 of your text book
Comparison of two outdoor paints : 6 gallons of each brand have been tested and the data obtained show how long ( in months) each brand will last before fading .
Brand A : 10 60 50 30 40 20
Brand B : 35 45 30 35 40 25
Calculate the mean for each brand :

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 26

Measures of dispersion are :
The range ,
The interquartile range

,
The variance and standard deviation ,
The coefficient of variation
The range (R) of a variable is the difference between the largest data value and the smallest data value
R = highest value – lowest value.
Properties of the range
Only two values are used in the calculation.
It is influenced by extreme values.
It is easy to compute and understand.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 27

Example
Compute the range of 6, 1, 2, 6, 11, 7, 3,

3
The largest value is 11
The smallest value is 1
Subtracting the two … 11 – 1 = 10 … the range is 10
Relative measure of Range called coefficient of Range

Chapter Three: Numerical Measures of the Data

Statistics 103110

Слайд 28

The variance of a variable
The variance is based on the deviation

from the mean
( xi – μ ) for populations
( xi – ) for samples
To treat positive differences and negative differences, we square the deviations
( xi – μ )2 for populations
( xi – )2 for samples

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 29

Chapter Three: Numerical Measures of the Data
The population variance of a

variable is the sum of the squared deviations of the data values from the mean divided by the number in the population
where
The population variance is represented by σ2
i.e. the square root of the arithmetic mean of the squares of deviations from arithmetic mean of given distribution.

Standard deviation: The square root of the variance.

Слайд 30

Properties of the variance and standard deviation
it is the typical or

approx. average distance from the mean
if it is small, then scores are clustered close to mean; if it is large, they are scattered far from mean
it describes how variable or spread out the scores are.
it is very influenced by extreme scores
The measurement units of the variance are square of the original units. While the measurement of the SD is same as the original data
All values are used in the calculation.
7 . Variance and St. dev are always greater than or equal to zero. They are equal zero only if all observations are the same.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 31

Chapter Three: Numerical Measures of the Data
The sample variance of a

variable is the sum of the squared deviations of data values from the mean divided by one less than the number in the sample
The sample variance is represented by s2
Sample standard deviation (s)
or

Statistics103110

We say that this statistic has n – 1 degrees of freedom
Example;- Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14.
ΣX = 16 + 19 + 15 + 15 + 14 = 79.
ΣX2 = 162 + 192 + 152 + 152 + 142 = 1263.
Using the short cut formula ( without calculating the mean)

Слайд 32

Symbols for Standard Deviation
Sample
Population
σ
σ x
xσn
Book
Some graphics
calculators
Some non-graphics
calculators
Textbook
Some graphics
calculators
Some non-graphics
calculators
Articles in

professional journals and reports often use SD for standard deviation and VAR for variance.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 33

Chapter Three: Numerical Measures of the Data
Sample Variance for Grouped and

Ungrouped Data
For grouped data, use the class midpoints for the observed value in the different classes.
For ungrouped data, use the same formula with the class midpoints, Xm, replaced with the actual observed X value.
Example:-
Find the variance and SD for the following data set
2,3,4,5,2,2,2,3,2,4,3,2,5,2,3,3,4,2,5,4,4,3,3,2,5,2

Statistics103110

Слайд 34

Step one put the data I ungrouped frequency table
Chapter Three: Numerical

Measures of the Data

Statistics103110

Слайд 35

Example:- find the variance and SD for the frequency distribution of

the data representing number of miles that 20 runners run during one week

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 36

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Слайд 37

Chapter Three: Numerical Measures of the Data
Interpretation and Uses of the

Standard Deviation
The standard deviation is used to measure the spread of the data. A small standard deviation indicates that the data is clustered close to the mean, thus the mean is representative of the data. A large standard deviation indicates that the data are spread out from the mean and the mean is not representative of the data.

Statistics103110

Слайд 38

Chapter Three: Numerical Measures of the Data
Coefficient of Variation :-
The

relative measure of St. Dev. is the coefficient of variation which is defined to be the standard deviation divided by the mean. The result is expressed as a percentage.
Or
Important note:
The coefficient of variation should only be computed for data measured on a ratio scale.
See the following example

Statistics103110

Слайд 39

Example :
To see why the coefficient of variation should not be

applied to interval level data, compare the same set of temperatures in Celsius and Fahrenheit:
Celsius: [0, 10, 20, 30, 40]
Fahrenheit: [32, 50, 68, 86, 104]
The CV of the first set is 15.81/20 = 0.79. For the second set (which are the same temperatures) it is 28.46/68 = 0.42
So the coefficient of variation does not have any meaning for data on an interval scale.

Слайд 40

Advantages
The coefficient of variation is useful because the standard deviation of

data must always be understood in the context of the mean of the data. The coefficient of variation is a unitless (dimensionless )number. So when comparing between data sets with different units or widely different means, one should use the coefficient of variation for comparison instead of the standard deviation.
Disadvantages
When the mean value is near zero, the coefficient of variation is sensitive to small changes in the mean, limiting its usefulness.
.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 41

Example:- Data about the annual salary (000’s) and age of CEO’s

in a number of firms has been collected. The means and standard deviations are as follows:
Which distribution has more dispersion? Is direct comparison appropriate?
Salary and age are measured in different units and the means show that there is also a significant difference in magnitude.
Direct comparison is not appropriate
Comparing CV’s we can now see clearly that the dispersion or variability relative to the mean is greater for CEO annual salary than for age.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 42

Chapter Three: Numerical Measures of the Data
Measure of position:
Measures of position

are used to locate the relative position of a data value in the data set
1- Standard Scores
To compare values of different units a z-score for each value is needed to be obtained then compared
A z-score or standard score for each value is obtained by
For sample
For population
The z-score represents the number SD that a data value falls above or below the mean.

Statistics103110

Слайд 43

Chapter Three: Numerical Measures of the Data
Standard Scores (or z-scores) specify

the exact location of a score within a distribution relative to the mean
The sign (- or +) tells whether the score is above or below the mean
The numerical value tells the distance from the mean in terms of standard deviations
E.g., a z-score of -1.3 tells us that the raw score fell 1.3 standard deviations below the mean.
Raw score is the original, untransformed score.
To make them more meaningful, raw scores can be converted to z-scores.

Statistics103110

Слайд 44

Chapter Three: Numerical Measures of the Data
Characteristics of Standard Scores
The shape

of the distribution of standard scores is the same as the shape of the distribution of raw scores (the only thing that changes is the units on the x-axis)
The mean of a set of standard scores = 0.
The St. deviation of a set of standard scores = 1.
A standard score of greater than +3 or less than - 3 is an extreme score, or an outlier.

Statistics103110

Слайд 45

Chapter Three: Numerical Measures of the Data
Example:- A student scored 65

on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score.
z = (65 – 50)/10 = 1.5.
That is, the score of 65 is 1.5 standard deviations above the mean.
Above - since the z-score is positive.
Assume that this student scored 70 on a math exam that had a mean of 80 and a standard deviation of 5 .
Compute the z-score .
Z= ( 70-80)/5=-2
That is, the score of 70 is 2 standard deviations below the mean.
below - since the z-score is positive.

Statistics103110

Слайд 46

Example:- a student scored 65 on a calculus test that had

a mean of 50 and a SD of 10. she scored 30 on statistics test with a mean of 25 and variance of 25, compare relative positions of the two tests.
Since the z-score for calculus is larger , her relative position in the calculus class is higher than her relative position in the statistics class.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 47

Chapter Three: Numerical Measures of the Data
Quartiles divide the data set

into 4 groups.
Quartiles are denoted by Q1, Q2, and Q3.
The median is the same as Q2.
Finding the Quartiles
Procedure: Let be the quartile and n the sample size.
Step 1: Arrange the data in order.
Step 2: Compute c = ({n+1}⋅k)/4.
Step 3: If c is not a whole number, round off to whole number. use
the value halfway between and .
Step 4: If c is a whole number then the value of is the position
value of the required percentile.

Statistics103110

2. Quartiles

Слайд 48

Chapter Three: Numerical Measures of the Data
Example:
For the following

data set: 2, 3, 5, 6, 8, 10, 12
Find Q1 and Q3
n = 7, so for Q1 we have c = ((7+1)⋅ 1)/4 = 2.
Hence the value of Q1 is the 2nd value.
Thus Q1 for the data set is 3.
for Q3 we have c = ((7+1)⋅ 3)/4 = 6.
Hence the value of Q3 is the 6th value.
Thus Q3 for the data set is 10.

Statistics103110

Слайд 49

Chapter Three: Numerical Measures of the Data
Example: Find Q1 and Q3

for the following data set:
2, 3, 5, 6, 8, 10, 12, 15, 18.
Note: the data set is already ordered.
n = 9, so for Q1 we have c = ((9+1)⋅ 1)/4 = 2.5.
Hence the value of Q1 is the halfway between the 2nd value and 3rd value.
for Q3 we have c = ((9+1)⋅ 3)/4 = 7.5.
Hence the value of Q3 is the halfway between the 7th value and 8th value

Statistics103110

Слайд 50

Chapter Three: Numerical Measures of the Data
Example:
For the following

data set: 2, 3, 5, 6, 8, 10, 12
Find Q1 and Q3
The median for the above data is 6
The median for the lower group of data which is less than median is 3
So the value of Q1 is the 2nd value which means that Q1 =3.
The median for the upper group of data which is grater than median is 10
So the value of Q3 is the 6th value which means that Q3 =10.

Statistics103110

Слайд 51

Chapter Three: Numerical Measures of the Data
The Q1 can be obtained

graphically using the Ogive

locate the point, which represent the value obtained from
(division n by 4; 34/4 = 8.5)
And draw a horizontal line until it intersects the Ogive then draw a vertical line until it intersects the X-axis. The intersection represent the
Value of Q1

Statistics103110

Слайд 52

Chapter Three: Numerical Measures of the Data
The Q3 can be obtained

graphically using the Ogive

locate the point, which represent the value
(of 3n by 4; (3*34)/4 = 25.5)
And draw a horizontal line until it intersects the Ogive then draw a vertical line until it intersects the X-axis. The intersection represent the value of Q3

Statistics103110

Слайд 53

Chapter Three: Numerical Measures of the Data
The Interquartile Range (IQR)
The Interquartile

Range, IQR = Q3 – Q1.
the Interquartile Range (IQR), also called the midspread , middle fifty or inner 50% data range, is a measure of statistical dispersion (variation), being equal to the difference between the third and first quartiles.

Statistics103110

Слайд 54

Chapter Three: Numerical Measures of the Data
An outlier is an extremely

high or an extremely low data value when compared with the rest of the data values.

Outliers

Statistics103110

To determine whether a data value can be considered as an outlier:
Step 1: Compute Q1 and Q3.
Step 2: Find the IQR = Q3 – Q1.
Step 3: Compute (1.5)(IQR).
Step 4: Compute Q1 – (1.5)(IQR) and Q3 + (1.5)(IQR).
they are called lower fence and upper fence
Step 5: Compare the data value (say X) with
lower and upper fences
If X < lower fence or if X > upper fence ,
then X is considered as an outlier.

Слайд 55

Example
Given the data set 5, 6, 12, 13, 15, 18, 22,

50, can the value of 50 be considered as an outlier?
Q1 = 9, Q3 = 20, IQR = 11. Verify.
(1.5)(IQR) = (1.5)(11) = 16.5.
9 – 16.5 = – 7.5 and 20 + 16.5 = 36.5.
The value of 50 is outside the range (– 7.5 to 36.5), hence 50 is an outlier.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 56

Chapter Three: Numerical Measures of the Data
Measure of Dispersion tells us

about the variation of the data set.
Skewness tells us about the direction of variation of the data set.
Definition:
Skewness is a measure of symmetry, or more precisely, the lack of symmetry.
Coefficient of Skewness
Unitless number that measures the degree and direction of symmetry of a distribution
There are several ways of measuring Skewness:
Pearson’s coefficient of Skewness

Statistics103110

Слайд 57

Chapter Three: Numerical Measures of the Data
For any bell shaped distribution:
Approximately

68% of the data values will fall within one standard deviation of the mean.
Approximately 95% will fall within two standard deviations of the mean.
Approximately 99.7% will fall within three standard deviations of the mean.

The Empirical (Normal) Rule

μ ± 1σ = 68% μ ± 2σ = 95% μ ± 3σ = 99.7%

Statistics103110

Слайд 58

The Empirical (Normal) Rule
μ ± 1σ = 68% μ ±

2σ = 95% μ ± 3σ = 99.7%

μ −3σ μ −2σ μ −1σ μ μ +1σ μ +2σ μ +3σ

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 59

Chapter Three: Numerical Measures of the Data
What is a Box Plot

To construct a box plot, first obtain the 5 number summary
{ Min, Q1, M, Q3, Max }

Statistics103110
The box-plot is a graphical representation of data When the data set contains a small number of values, a box plot is used to graphically represent the data set. These plots involve five values: the minimum value (the smallest value which is not an outlier), the first quartile, the median, the third quartile, and the maximum value (the largest value which is not an outlier).

Слайд 60

The box plot is useful in analyzing small data sets that

do not lend themselves easily to histograms. Because of the small size of a box plot, it is easy to display and compare several box plots in a small space.
A box plot is a good alternative or complement to a histogram and is usually better for showing several simultaneous comparisons.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 61

How to use it:
Collect and arrange data. Collect the data and

arrange it into an ordered set from lowest value to highest.
Calculate the median. M = median= Q2
Calculate the first quartile. (Q1)
Calculate the third quartile. (Q3)
Calculate the interquartile rage (IQR). This range is the difference between the first and third quartile vales. (Q3 - Q1)
Obtain the maximum. This is the largest data value that is less than or equal to the third quartile plus 1.5 X IQR.
Q3 + [(Q3 - Q1) X 1.5]
.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 62

Obtain the minimum. This value will be the smallest data value

that is greater than or equal to the first quartile minus 1.5 X IQR.
Q1 - [(Q3 - Q1) X 1.5]
Draw and label the axes of the graph. The scale of the horizontal axis must be large enough to encompass the greatest value of the data sets.
Draw the box plots. Construct the box, insert median points, and attach maximum and minimum. Identify outliers (values outside the upper and lower fences) with asterisks.
The box plot can provide answers to the following questions:
Does the location differ between subgroups?
Does the variation differ between subgroups?
Are there any outliers?

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 63

Example 1:- Failure times of industrial machines (in hours)
32.56 42.02 47.26

50.25 59.03 60.17 61.56 62.16 62.84 63.29 63.52 65.52 66.54 68.71 70.60 71.27 76.33 80.37 82.87
5 # summary: { 32.56 , 59.03 , 63.29 , 70.60 , 82.87 }
The final product: A Simple Box-plot. Only quartile information is displayed.
A mathematical rule designates “outliers.” These are plotted using special symbols.

Chapter Three: Numerical Measures of the Data

Statistics103110

Слайд 64

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Слайд 65

Chapter Three: Numerical Measures of the Data
Now find the interquartile range (IQR). The

interquartile range is the difference between the upper quartile and the lower quartile. In this case the IQR = 87 - 52 = 35. The IQR is a very useful measurement. It is useful because it is less influenced by extreme values, it limits the range to the middle 50% of the values.
35 is the interquartile range
begin to draw Box-plot graph.

Statistics103110

Слайд 66

Chapter Three: Numerical Measures of the Data
Example 2
Consider two datasets:
A1={0.22, -0.87,

-2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09}
A2={-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50}
Notice that both datasets are approximately balanced around zero; evidently the mean in both cases is "near" zero. However there is substantially more variation in A2 which ranges approximately from -6 to 6 whereas A1 ranges approximately from -2½ to 2½.
Below find box plots. Notice the difference in scales: since the box plot is displaying the full range of variation, the y-range must be expanded.

Statistics103110

Слайд 67

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Слайд 68

Statistics. Data Description. Data Summarization. Numerical Measures of the Data

Содержание

Chapter Three: Numerical Measures of the DataOutlineIntroduction3-1 Measures of Central Tendency3-2

Chapter Three: Numerical Measures of the DataObjectivesSummarize data using the measures

Chapter Three: Numerical Measures of the Data3-1 Measures of Central tendencyWe

Chapter Three: Numerical Measures of the DataExample:- (Sample Mean)The ages of

Chapter Three: Numerical Measures of the DataExample:- population meanStatistics1031103-

Chapter Three: Numerical Measures of the DataThe Sample Mean for an

Chapter Three: Numerical Measures of the DataThe Sample Mean for an

Chapter Three: Numerical Measures of the DataThe Sample Mean for a

Important remark :In some situations the mean may not be representative

Properties of the mean As stated, the mean is a widely

Chapter Three: Numerical Measures of the DataMedian : The median splits

Chapter Three: Numerical Measures of the DataWhen there is an even

exampleFind the median grade of the following sample of students grades

Properties of the Median The major properties of the median are:The median

Chapter Three: Numerical Measures of the DataMode:- is the score that

Chapter Three: Numerical Measures of the DataExample:- Eleven different automobiles were

Chapter Three: Numerical Measures of the DataThe Mode for a Grouped

Properties of the Mode The mode can be found for all levels

Chapter Three: Numerical Measures of the DataThe weighted mean is used

Chapter Three: Numerical Measures of the DataExample:- During a one hour

Best measure of central tendency

Relationship between mean , median and mode and the shape of

Chapter Three: Numerical Measures of the Data3-2 Measures of Dispersion( variation)o

Variability -- provides a quantitative measure of the degree to which

Measures of dispersion are :The range , The interquartile range

ExampleCompute the range of 6, 1, 2, 6, 11, 7, 3,

The variance of a variableThe variance is based on the deviation

Chapter Three: Numerical Measures of the DataThe population variance of a

Properties of the variance and standard deviationit is the typical or

Chapter Three: Numerical Measures of the DataThe sample variance of a

Symbols for Standard DeviationSamplePopulationσσ xxσn BookSome graphicscalculatorsSome non-graphicscalculatorsTextbookSome graphicscalculatorsSome non-graphicscalculatorsArticles in

Chapter Three: Numerical Measures of the DataSample Variance for Grouped and

Step one put the data I ungrouped frequency tableChapter Three: Numerical

Example:- find the variance and SD for the frequency distribution of

Chapter Three: Numerical Measures of the DataStatistics1031103-

Chapter Three: Numerical Measures of the DataInterpretation and Uses of the

Chapter Three: Numerical Measures of the DataCoefficient of Variation :- The

Example :To see why the coefficient of variation should not be

AdvantagesThe coefficient of variation is useful because the standard deviation of

Example:- Data about the annual salary (000’s) and age of CEO’s

Chapter Three: Numerical Measures of the DataMeasure of position:Measures of position

Chapter Three: Numerical Measures of the DataStandard Scores (or z-scores) specify

Chapter Three: Numerical Measures of the DataCharacteristics of Standard ScoresThe shape

Chapter Three: Numerical Measures of the DataExample:- A student scored 65

Example:- a student scored 65 on a calculus test that had

Chapter Three: Numerical Measures of the DataQuartiles divide the data set

Chapter Three: Numerical Measures of the DataExample: For the following

Chapter Three: Numerical Measures of the DataExample: Find Q1 and Q3

Chapter Three: Numerical Measures of the DataExample: For the following

Chapter Three: Numerical Measures of the DataThe Q1 can be obtained

Chapter Three: Numerical Measures of the DataThe Q3 can be obtained

Chapter Three: Numerical Measures of the DataThe Interquartile Range (IQR)The Interquartile

Chapter Three: Numerical Measures of the DataAn outlier is an extremely

ExampleGiven the data set 5, 6, 12, 13, 15, 18, 22,

Chapter Three: Numerical Measures of the DataMeasure of Dispersion tells us

Chapter Three: Numerical Measures of the DataFor any bell shaped distribution:Approximately

The Empirical (Normal) Rule μ ± 1σ = 68% μ ±

Chapter Three: Numerical Measures of the DataWhat is a Box Plot

The box plot is useful in analyzing small data sets that

How to use it:Collect and arrange data. Collect the data and

Obtain the minimum. This value will be the smallest data value

Example 1:- Failure times of industrial machines (in hours)32.56 42.02 47.26

Chapter Three: Numerical Measures of the DataStatistics1031103-

Chapter Three: Numerical Measures of the DataNow find the interquartile range (IQR). The

Chapter Three: Numerical Measures of the DataExample 2Consider two datasets:A1={0.22, -0.87,

Chapter Three: Numerical Measures of the DataStatistics1031103-

Chapter Three: Numerical Measures of the DataStatistics1031103-

Похожие презентации

Chapter Three: Numerical Measures of the Data
Outline
Introduction
3-1 Measures of Central Tendency
3-2

Chapter Three: Numerical Measures of the Data
Objectives
Summarize data using the measures

Chapter Three: Numerical Measures of the Data
3-1 Measures of Central tendency
We

Chapter Three: Numerical Measures of the Data
Example:- (Sample Mean)
The ages of

Chapter Three: Numerical Measures of the Data
Example:- population mean
Statistics103110
3-

Chapter Three: Numerical Measures of the Data
The Sample Mean for an

Chapter Three: Numerical Measures of the Data
The Sample Mean for an

Chapter Three: Numerical Measures of the Data
The Sample Mean for a

Important remark :
In some situations the mean may not be representative

Properties of the mean
As stated, the mean is a widely

Chapter Three: Numerical Measures of the Data
Median : The median splits

Chapter Three: Numerical Measures of the Data
When there is an even

example
Find the median grade of the following sample of students grades

Properties of the Median
The major properties of the median are:
The median

Chapter Three: Numerical Measures of the Data
Mode:- is the score that

Chapter Three: Numerical Measures of the Data
Example:- Eleven different automobiles were

Chapter Three: Numerical Measures of the Data
The Mode for a Grouped

Properties of the Mode
The mode can be found for all levels

Chapter Three: Numerical Measures of the Data
The weighted mean is used

Chapter Three: Numerical Measures of the Data
Example:- During a one hour

Chapter Three: Numerical Measures of the Data
3-2 Measures of Dispersion( variation)
o

Measures of dispersion are :
The range ,
The interquartile range

Example
Compute the range of 6, 1, 2, 6, 11, 7, 3,

The variance of a variable
The variance is based on the deviation

Chapter Three: Numerical Measures of the Data
The population variance of a

Properties of the variance and standard deviation
it is the typical or

Chapter Three: Numerical Measures of the Data
The sample variance of a

Symbols for Standard Deviation
Sample
Population
σ
σ x
xσn
Book
Some graphics
calculators
Some non-graphics
calculators
Textbook
Some graphics
calculators
Some non-graphics
calculators
Articles in

Chapter Three: Numerical Measures of the Data
Sample Variance for Grouped and

Step one put the data I ungrouped frequency table
Chapter Three: Numerical

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Chapter Three: Numerical Measures of the Data
Interpretation and Uses of the

Chapter Three: Numerical Measures of the Data
Coefficient of Variation :-
The

Example :
To see why the coefficient of variation should not be

Advantages
The coefficient of variation is useful because the standard deviation of

Chapter Three: Numerical Measures of the Data
Measure of position:
Measures of position

Chapter Three: Numerical Measures of the Data
Standard Scores (or z-scores) specify

Chapter Three: Numerical Measures of the Data
Characteristics of Standard Scores
The shape

Chapter Three: Numerical Measures of the Data
Example:- A student scored 65

Chapter Three: Numerical Measures of the Data
Quartiles divide the data set

Chapter Three: Numerical Measures of the Data
Example:
For the following

Chapter Three: Numerical Measures of the Data
Example: Find Q1 and Q3

Chapter Three: Numerical Measures of the Data
Example:
For the following

Chapter Three: Numerical Measures of the Data
The Q1 can be obtained

Chapter Three: Numerical Measures of the Data
The Q3 can be obtained

Chapter Three: Numerical Measures of the Data
The Interquartile Range (IQR)
The Interquartile

Chapter Three: Numerical Measures of the Data
An outlier is an extremely

Example
Given the data set 5, 6, 12, 13, 15, 18, 22,

Chapter Three: Numerical Measures of the Data
Measure of Dispersion tells us

Chapter Three: Numerical Measures of the Data
For any bell shaped distribution:
Approximately

The Empirical (Normal) Rule
μ ± 1σ = 68% μ ±

Chapter Three: Numerical Measures of the Data
What is a Box Plot

How to use it:
Collect and arrange data. Collect the data and

Example 1:- Failure times of industrial machines (in hours)
32.56 42.02 47.26

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Chapter Three: Numerical Measures of the Data
Now find the interquartile range (IQR). The

Chapter Three: Numerical Measures of the Data
Example 2
Consider two datasets:
A1={0.22, -0.87,

Chapter Three: Numerical Measures of the Data
Statistics103110
3-

Chapter Three: Numerical Measures of the Data
Statistics103110
3-