Tải bản đầy đủ - 0 (trang)
8 The interquartile range; the quartile deviation

# 8 The interquartile range; the quartile deviation

Tải bản đầy đủ - 0trang

Basic weekly wage (£) (less than)

200

225

250

275

300

Cumulative frequency

16

169

270

362

430

It will be noted that it is unnecessary to close the ﬁnal class in order to draw the ogive, and so we do not do

so. The ogive is shown in Figure 4.6. Alternatively, a sensible closing value, such as £325, could be selected

and an extra point, with cumulative frequency 480, added to the ogive.

Cumulative

frequency

500

400

300

200

100

0

150

Figure 4.6

200

250

Q1

Q3

Weekly wage (£)

300

Ogive: wage distribution for Example 4.8.1.

We now note that the total frequency is 480, and so, from the constructions shown on the ogive, we have the

following approximations:

3

of 480, or 360) ϭ £274

4

1

Q1(corresponding to a cumulative frequency of

of 480, or 120) ϭ £217

4

Q3(corresponding to a cumulative frequency of

and thus:

Interquartile range ϭ £274 Ϫ £217 ϭ £57 .

Thus, the manager could use an approximate measure of the spread of wages of the middle 50 per cent of the

workforce of £57.

There is a very closely related measure here, the quartile deviation, which is half the

interquartile range. In the above example, the quartile deviation is £28.50. In practice,

the quartile deviation is used rather more than the interquartile range. If you rearrange

Q 3 Ϫ Q 1 as (Q 3 Ϫ M) ϩ (M Ϫ Q 1) you will see that the two expressions in brackets

DESCRIPTIVE STATISTICS

In fact, to determine the interquartile range, we adopt the same approach as we did for the median. First of

all, we assume that the wage values are evenly spread throughout their classes, and draw the ogive. The necessary cumulative frequency distribution is:

141

DESCRIPTIVE STATISTICS

142

STUDY MATERIAL C3

give the distances from the quartiles to the median and then dividing by two gives the

average distance from the quartiles to the median. So we can say that approximately 50 per

cent of the observations lie withinϮone quartile deviation of the median.

Example 4.8.2

Using the data on the output of product Q (see Example 4.3.4), ﬁnd the quartiles, the interquartile range and the

quartile deviation from the ogive (Figure 4.3).

Solution

The total frequency ϭ 22, so the cumulative frequency of Q1 is 22/4 ϭ 5.5, and the cumulative frequency of

Q3 is (3 ϫ 22/4) ϭ 16.5

From the ogive, Q1 ϭ 362.5 kg and Q3 ϭ 383.5 kg

Hence, the interquartile range ϭ 383.5 Ϫ 362 ϭ 21.5 kg, and the quartile deviation ϭ 21.5 ÷ 2 ϭ 10.75 kg.

4.9

Deciles

Just as quartiles divide a cumulative distribution into quarters, deciles divide a cumulative

distribution into tenths. Thus:

the ﬁrst decile has 10 per cent of values below it and 90 per cent above it,

the second decile has 20 per cent of values below it and 80 per cent above it and so on.

The use and evaluation of deciles can best be illustrated through an example.

Example 4.9.1

As a promotional example, a mail-order company has decided to give free gifts to its highest-spending customers.

It has been suggested that the highest-spending 30 per cent get a gift, while the highest-spending 10 per cent

get an additional special gift. The following distribution of a sample of spending patterns over the past year is

available:

Amount spent (£)

under 50

50–under 100

100–under 150

150–under 200

200–under 300

300 and over

Number of customers

spending this amount

37

59

42

20

13

9

Solution

We have to determine the ninth decile (90 per cent below it) and the seventh decile (70 per cent below it).

These can be found in the same way as with quartiles, by reading from an ogive.

Amount spent (less than, £)

50

100

150

200

300

Cumulative frequency

37

96

138

158

171

The ogive is shown in Figure 4.7.

Histogram Payment record of 100 customers

Cumulative

frequency

200

150

100

50

0

0

50

100

150

200

250

300

Amount spent last year (£)

7th decile

Figure 4.7

9th decile

Ogive for Example 4.9.1

The ninth decile will correspond to a cumulative frequency of 162 (90 per cent of the total frequency, 180).

From the ogive, this is: £230.

Similarly, the seventh decile corresponds to a cumulative frequency of 70 per cent of 180, that is 126. From

the ogive, this is: £135.

Hence, in order to implement the suggestion, the company should give the free gift to those customers who

have spent over £135 in the past year, and the additional free gift to those who have spent over £230.

Example 4.9.2

Using the data on the output of product Q (see Example 4.3.4), ﬁnd the ninth decile from the ogive (Figure 4.3).

Solution

The cumulative frequency of the ninth decile is 0.9 ϫ 22 ϭ 19.8. From the ogive, the ninth decile is 392.5 kg

(approximately).

DESCRIPTIVE STATISTICS

The cumulative frequency distribution (ignoring the last open-ended class) is:

143

DESCRIPTIVE STATISTICS

144

STUDY MATERIAL C3

In your exam you cannot be asked to draw the ogive so you just have to know

how to obtain the quartiles and percentiles from it. It is possible to calculate these

statistics but this is not required in your syllabus and the formulae are not given.

4.10

The mean absolute deviation

If the mean is the average being used, then one very good way of measuring the amount of

variability in the data is to calculate the extent to which the values differ from the mean.

This is essentially the thinking behind the mean absolute deviation and the standard deviation (for which, see Section 4.11).

£1,120

£990

£1,040

£1,030

£1,105

£1,015

Example 4.10.1

Measure the spread of shop A’s weekly takings (Example 4.7.1), given the following sample over 6 weeks. The

sample has an arithmetic mean of £1,050.

Solution

A simple way of seeing how far a single value is from a (hopefully) representative average ﬁgure is to determine

the difference between the two. In particular, if we are dealing with the mean, x , this difference is known as the

deviation from the mean or, more simply, the deviation. It is clear that, for a widely spread data set, the deviations of the individual values in the set will be relatively large. Similarly, narrowly spread data sets will have relatively small deviation values. We can therefore base our measure on the values of the deviations from the mean.

In this case:

Deviation ϭ x Ϫ x

In this case, the values of (x Ϫx ) are:

£70, Ϫ£60, Ϫ£10, Ϫ£20, £55, Ϫ£35

The obvious approach might now be to take the mean of these deviations as our measure. Unfortunately, it can

be shown that this always turns out to be zero and so the mean deviation will not distinguish one distribution

from another. The basic reason for this result is that the negative deviations, when summed, exactly cancel out the

positive ones: we must therefore remove this cancellation effect.

One way to remove negative values is simply to ignore the signs, that is, to use the absolute values. In this

case, the absolute deviations are:

(x Ϫ x ) : £70, £60, £10, £20, £55, £35

The two vertical lines are the mathematical symbol for absolute values and are often referred to as ‘modulus’, or

‘mod’, of (x Ϫx ) in this case. The mean of this list is now a measure of the spread in the data. It is known as the

mean absolute deviation. Hence the mean absolute deviation of weekly takings for shop A is:

70 ϩ 60 ϩ 10 ϩ 20 ϩ 55 ϩ 35

ϭ £41 .67

6

Thus, our ﬁrst measure of the spread of shop A’s weekly takings is £41.67.

Find the mean absolute deviation for the following data:

2

3

5

7

8

Solution

The mean, x , is (2 ϩ 3 ϩ 5 ϩ 7 ϩ 8)/5 ϭ 5. So absolute deviations are given by subtracting 5 from each of

the data and ignoring any negative signs. This gives values of:

3

2

0

2

3

The mean absolute deviation is (3 ϩ 2 ϩ 0 ϩ 2 ϩ 3)/5 ϭ 2.

The mean deviation is not explicitly mentioned in your syllabus and is

unlikely to be examined. We have included it as part of the theoretical build

up to the standard deviation.

4.11

The standard deviation

In the preceding section, we solved the problem of negative deviations cancelling out

positive ones by using absolute values. There is another way of ‘removing’ negative signs,

namely by squaring the ﬁgures. If we do that, then we get another, very important, measure of spread, the standard deviation.

Example 4.11.1

Evaluate the measure of the spread in shop A’s weekly takings (Example 4.7.1), using this new approach.

Solution

Recall that we have the deviations:

x Ϫ x : £70, Ϫ£60, Ϫ£10, Ϫ£20, £55, Ϫ£35

so, by squaring, we get:

(x Ϫ x )2: 4,900, 3,600, 100, 400, 3,025, 1225

,

The mean of these squared deviations is:

13250

,

ϭ 2,208 .3

6

This is a measure of spread whose units are the square of those of the original data, because we squared the

deviations. We thus take the square root to get back to the original units (£). Our measure of spread is therefore:

'2208 . 3 ϭ £46 .99

This is known as the standard deviation, denoted by ‘s’. Its square, the intermediate step before square-rooting,

is called the variance, s2.

The formula that has been implicitly used here is:

s ϭ

(x Ϫ x )2

n

DESCRIPTIVE STATISTICS

Example 4.10.2

145

DESCRIPTIVE STATISTICS

146

STUDY MATERIAL C3

Applying the same series of steps to the data in a frequency distribution will give us the corresponding formula in

this case:

square the deviations: (x Ϫ x )2

ﬁnd the mean of the (x Ϫ x )2 values occurring with frequencies denoted by f.

f (x Ϫ x )2

(ϭs2 )

n

Take the square root:

(x Ϫ x )2

(ϭs)

f

In practice, this formula can turn out to be very tedious to apply. It can be shown that the following, more easily

applicable, formula is the same as the one above:

2

fx 2 ⎛⎜ fx ⎞⎟

⎟⎟

Ϫ⎜

⎜⎝ f ⎟⎠

f

s ϭ

This formula will be given in the Business Mathematics exam, with

of ⌺fx/⌺f.

x

in place

An example will now demonstrate a systematic way of setting out the computations

involved with this formula.

Example 4.11.2

An analyst is considering two categories of company, X and Y, for possible investment. One of her assistants has

compiled the following information on the price-earnings ratios of the shares of companies in the two categories

over the past year.

Price-earnings ratios

4.95–under 8.95

8.95–under 12.95

12.95–under 16.95

16.95–under 20.95

20.95–under 24.95

24.95–under 28.95

Number of category X

companies

3

5

7

6

3

1

Number of category Y

companies

4

8

8

3

3

4

Compute the standard deviations of these two distributions and comment. (You are given that the means of the

two distributions are 15.59 and 15.62, respectively.)

Solution

Concentrating ﬁrst of all on category X, we see that we face the same problem as when we calculated the mean

of such a distribution, namely that we have classiﬁed data, instead of individual values of x. Adopting a similar

approach as before, we take the mid-point of each class:

f

3

5

7

6

3

1

25

fx

20.85

54.75

104.65

113.70

68.85

26.95

389.75

fx2

144.9075

599.5125

1,564.5175

2,154.6150

1,580.1075

726.3025

6,769.9625

Thus the standard deviation is:

s ϭ

⎛ fx ⎞⎟2

fx 2

Ϫ ⎜⎜

⎜⎝ f ⎟⎟⎠

f

s ϭ

⎛ 389 .75 ⎞⎟2

6,769 . 9625

Ϫ ⎜⎜

⎜⎝ 25 ⎠⎟

25

ϭ

270 . 7985 Ϫ 243 . 0481 ϭ

27 . 7504 ϭ 5 .27 .

The standard deviation of the price-earnings ratios for category X is therefore 5.27. In the same way, you can

verify that the standard deviation in the case of category Y is 6.29. These statistics again emphasise the wider

spread in the category Y data than in the category X data. Note how a full degree of accuracy (four decimal

places) is retained throughout the calculation in order to ensure an accurate ﬁnal result.

The calculation for Y should be as for X above. In outline:

x (mid-point)

6.95

26.95

s ϭ

x2

48.3025

726.3025

f

4

4

30

fx

27.80

107.80

468.50

fx2

193.210

2,905.210

8,503.075

(283 . 4358 Ϫ 243 . 8803) ϭ 6 . 289

Example 4.11.3

Using the data from Example 4.2.3 relating to absences from work, and the mean that you have already calculated, ﬁnd the standard deviation.

No. of employees absent

2

3

4

5

6

7

8

No. of days (frequency)

2

4

3

4

3

3

3

It is probably easiest to calculate fx2 by multiplying fx by x, for example, 2 ϫ 4, 3 ϫ 12, etc.

DESCRIPTIVE STATISTICS

x2

48.3025

119.9025

223.5025

359.1025

526.7025

726.3025

x (mid-point)

6.95

10.95

14.95

18.95

22.95

26.95

147

DESCRIPTIVE STATISTICS

148

STUDY MATERIAL C3

Solution

x

2

3

4

5

6

7

8

ϭ

fx 2

Ϫ x2

f

ϭ

f

2

4

3

4

3

3

3

22

fx

4

12

12

20

18

21

24

111

fx2

8

36

48

100

108

147

192

639

⎛111⎞⎟2

639

Ϫ ⎜⎜

⎜⎝ 22 ⎟⎠

22

(29 .0455 Ϫ 25 .4566) ϭ

3 .58 8 9 ϭ 1 .89 (to two d.p.)

Example 4.11.4

Using the data from Exercise 4.2.5 relating to output of product Q, and the mean that you have already calculated, ﬁnd the standard deviation.

Output of Q (kg)

350–under 360

360–370

370–380

380–390

390–400

No. of days (frequency)

4

6

5

4

3

Solution

Mid-point

x

355

365

375

385

395

s ϭ

ϭ

4.12

fx 2

Ϫ x2 ϭ

f

Frequency

f

4

6

5

4

3

22

fx

1,420

2,190

1,875

1,540

1,185

8,210

fx2

504,100

799,350

703,125

592,900

468,075

3,067,550

⎛ 8,210 ⎞⎟2

3,067,550

Ϫ ⎜⎜

⎜⎝ 22 ⎟⎠

22

(139,434 .0909 Ϫ 1 39,264 .6694) ϭ 169 .4215 ϭ 13 . 02 (to two d.p.)

The coefficient of variation

The coefﬁcient of variation is a statistical measure of the dispersion of data points in a data

series around the mean. It is calculated as follows:

Coefficient of variation ϭ

Standard deviation

Expected return

Example 4.12.1

Government statistics on the basic weekly wages of workers in two countries show the following. (All ﬁgures converted to sterling equivalent.)

Country V:

Country W:

x ϭ 120

x ϭ 90

s ϭ £55

s ϭ £50

Can we conclude that country V has a wider spread of basic weekly wages?

Solution

By simply looking at the two standard deviation ﬁgures, we might be tempted to answer ‘yes’. In doing so, however, we should be ignoring the fact that the two mean values indicate that wages in country V are inherently

higher, and so the deviations from the mean and thus the standard deviation will tend to be higher. To make a

comparison of like with like we must use the coefﬁcient of variation:

Coefficient of variation ϭ

s

x

Thus:

Coefficient of variation of wages in country V ϭ

55

ϭ 45 .8%

120

Coefficient of variation of wages in country W ϭ

50

ϭ 55 .6%

90

Hence we see that, in fact, it is country W that has the higher variability in basic weekly wages.

Example 4.12.2

Calculate the coefﬁcients of variation for the data in Exercises 4.11.3 and 4.11.4.

Solution

In Example 4.11.3, x ϭ 5.045 and s ϭ 1.8944, so the coefﬁcient of variation is: 100 ϫ 1.8944ր

5.045 ϭ 37.6%

In Example 4.11.4, x ϭ 373.18 and s ϭ 13.0162, so the coefﬁcient of variation is: 100 ϫ 13.0162ր

373.18 ϭ 3.5%.

DESCRIPTIVE STATISTICS

The coefﬁcient of variation is the ratio of the standard deviation to the mean, and is useful

when comparing the degree of variation from one data series to another, even if the means

are quite different from each other.

In a ﬁnancial setting, the coefﬁcient of variation allows you to determine how much

risk you are assuming in comparison to the amount of return you can expect from an

investment. The lower the ratio of standard deviation to mean return, the better your riskreturn tradeoff.

Note that if the expected return in the denominator of the calculation is negative or

zero, the ratio will not make sense.

In Example 4.11.2, it was relatively easy to compare the spread in two sets of data by

looking at the standard deviation ﬁgures alone, because the means of the two sets were so

similar. Another example will show that it is not always so straightforward.

149

DESCRIPTIVE STATISTICS

150

STUDY MATERIAL C3

4.13

A comparison of the measures of spread

Like the mode, the range is little used except as a very quick initial view of the overall spread of the data. The problem is that it is totally dependent on the most extreme

values in the distribution, which are the ones that are particularly liable to reﬂect errors or

one-off situations. Furthermore, the range tells us nothing at all about how the data is

The standard deviation is undoubtedly the most important measure of spread. It has a

formula that lends itself to algebraic manipulation, unlike the quartile deviation, and so,

along with the mean, it is the basis of almost all advanced statistical theory. This is a pity

because it does have some quite serious disadvantages. If data is skewed, the standard deviation will exaggerate the degree of spread because of the large squared deviations associated

with extreme values. Similarly, if a distribution has open intervals at the ends, the choice of

limits and hence of mid-points will have a marked effect on the standard deviation.

The quartile deviation, and to a lesser extent the interquartile range, is the best measure

of spread to use if the data is skewed or has open intervals. In general, these measures would

not be preferred to the standard deviation because they ignore much of the data and are little known.

Finally, it is often the case that data is intended to be compared with other data, perhaps

nationwide ﬁgures or previous year’s ﬁgures, etc. In such circumstances, unless you have

access to all the raw data, you are obliged to compare like with like, regardless perhaps of

4.14

Descriptive statistics using Excel

Many of the techniques discussed in this chapter can be facilitated through the use of

Excel. This section discusses a number of these, including the mean, the mode, the median,

the standard deviation, the variance and the range.

Figure 4.8 shows 100 observations that represent sample production weights of a product such as cereals, produced in grams. This data is the sample data from which the descriptive statistics are measured. The term sample is important as it implies that the data does

not represent the full population and this affects some of the spreadsheet functions used.

A population is the complete data set from which a conclusion is to be made.

Figure 4.8

100 observations of production weights in grams

Figure 4.9

Results of descriptive statistics

Mean

To calculate the sample arithmetic mean of the production weights the average function

is used as follows in cell b4.

ϭ

AVERAGE(DATA)

It is important to note that the average function totals the cells containing values and

divides by the number of cells that contain values. In certain situations this may not produce the required results and it might be necessary to ensure that zero has been entered in

order that the function sees the cell as containing a value.

Sample median

The sample median is deﬁned as the middle value when the data values are ranked in

increasing, or decreasing, order of magnitude. The following formula in cell b5 uses the

median function to calculate the median value for the production weights.

ϭ MEDIAN(DATA)

Sample mode

The sample mode is deﬁned as the value which occurs most frequently. The following formula is required in cell b6 to calculate the mode of the production weights.

ϭ MODE(DATA)

DESCRIPTIVE STATISTICS

The data, has been entered into a spreadsheet and the range a3 through e22 has been

named data. Any rectangular range of cells in Excel can be given a name, which can be

easier to reference than depending cell references. To name the range, ﬁrst select the area to

be named (a3:e22 in this case), double-click on the name box at the top of the screen and

type in the required name (data in this case).

The mean, median and mode are described as measures of central tendency and offer different ways of presenting a typical or representative value of a group of values. The range,

the standard deviation and the variance are measures of dispersion and refer to the degree to

which the observations in a given data set are spread about the arithmetic mean. The mean

is the most frequently used measure of central tendency, and statisticians to describe a data

set frequently use the mean together with the standard deviation. Figure 4.9 shows the

result of the descriptive statistic functions. Each statistic is explained in detail below.

151

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

8 The interquartile range; the quartile deviation

Tải bản đầy đủ ngay(0 tr)

×