Tải bản đầy đủ - 0 (trang)
3: Interpreting and Communicating the Results of Statistical Analyses

3: Interpreting and Communicating the Results of Statistical Analyses

Tải bản đầy đủ - 0trang

602



Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests



It is also a good idea to include a table of observed and expected counts in addition to reporting the computed value of the test statistic and the P-value. And finally,

make sure to give a conclusion in context, and make sure that the conclusion is

worded appropriately for the type of test conducted. For example, don’t use terms

such as independence and association to describe the conclusion if the test performed

was a test for homogeneity.



Interpreting the Results of Statistical Analyses

As with the other hypothesis tests considered, it is common to find the result of a

chi-square test summarized by giving the value of the chi-square test statistic and an

associated P-value. Because categorical data can be summarized compactly in frequency tables, the data often are given in the article (unlike data for numerical variables, which are rarely given).



What to Look For in Published Data

Here are some questions to consider when you are reading an article that contains the

results of a chi-square test:

• Are the variables of interest categorical rather than numerical?

• Are the data given in the article in the form of a frequency table?

• If a two-way frequency table is involved, is the question of interest one of homo-



geneity or one of independence?

• What null hypothesis is being tested? Are the results of the analysis reported in



the correct context (homogeneity, etc.)?

• Is the sample size large enough to make use of a chi-square test reasonable? (Are



all expected counts at least 5?)

• What is the value of the test statistic? Is the associated P-value given? Should the



null hypothesis be rejected?

• Are the conclusions drawn by the authors consistent with the results of the test?

• How different are the observed and expected counts? Does the result have practi-



cal significance as well as statistical significance?

The authors of the article “Predicting Professional Sports Game Outcomes from

Intermediate Game Scores” (Chance [1992]: 18–22) used a chi-square test to determine whether there was any merit to the idea that basketball games are not settled

until the last quarter, whereas baseball games are over by the seventh inning. They

also considered football and hockey. Data were collected for 189 basketball games,

92 baseball games, 80 hockey games, and 93 football games. The analyzed games were

sampled randomly from all games played during the 1990 season for baseball and football and for the 1990–1991 season for basketball and hockey. For each game, the lategame leader was determined, and then it was noted whether the late-game leader actually

ended up winning the game. The resulting data are summarized in the following table:



Sport

Basketball

Baseball

Hockey

Football



Late-Game

Leader Wins



Late-Game

Leader Loses



150

86

65

72



39

6

15

21



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



12.3 Interpreting and Communicating the Results of Statistical Analyses



603



The authors stated that the “late-game leader is defined as the team that is ahead

after three quarters in basketball and football, two periods in hockey, and seven

innings in baseball. The chi-square value (with three degrees of freedom) is 10.52

(P Ͻ .015).” They also concluded that “the sports of basketball, hockey, and football

have remarkably similar percentages of late-game reversals, ranging from 18.8% to

22.6%. The sport that is an anomaly is baseball. Only 6.5% of baseball games resulted in late reversals. . . . [The chi-square test] is statistically significant due almost

entirely to baseball.”

In this particular analysis, the authors are comparing four populations (games from

each of the four sports) on the basis of a categorical variable with two categories (lategame leader wins and late-game leader loses). The appropriate null hypothesis is then

H0: The population proportion in each category (leader wins, leader loses) is the

same for all four sports.

Based on the reported value of the chi-square statistic and the associated P-value, this

null hypothesis is rejected, leading to the conclusion that the category proportions are

not the same for all four sports.

The validity of the chi-square test requires that the sample sizes be large enough so

that no expected counts are less than 5. Is this reasonable here? The following Minitab

output shows the expected cell counts and the computation of the X 2 statistic:

Chi-Square Test

Expected counts are printed below observed counts

Leader W

Leader L

150

39

155.28

33.72

2

86

6

75.59

16.41

3

65

15

65.73

14.27

4

72

21

76.41

16.59

Total

373

81

Chi-Sq = 0.180 + 0.827 +

1.435 + 6.607 +

0.008 + 0.037 +

0.254 + 1.171 = 10.518

DF = 3, P-Value = 0.015

1



Total

189

92

80

93

454



The smallest expected count is 14.27, so the sample sizes are large enough to

justify the use of the X 2 test. Note also that the two cells in the table that correspond

to baseball contribute a total of 1.435 ϩ 6.607 ϭ 8.042 to the value of the X 2 statistic

of 10.518. This is due to the large discrepancies between the observed and expected

counts for these two cells. There is reasonable agreement between the observed and

the expected counts in the other cells. This is probably the basis for the authors’ conclusion that baseball is the anomaly and that the other sports were similar.



A Word to the Wise: Cautions and Limitations

Be sure to keep the following in mind when analyzing categorical data using one of

the chi-square tests presented in this chapter:

1. Don’t confuse tests for homogeneity with tests for independence. The hypotheses

and conclusions are different for the two types of test. Tests for homogeneity

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



604



Chapter 12



The Analysis of Categorical Data and Goodness-of-Fit Tests



are used when the individuals in each of two or more independent samples are

classified according to a single categorical variable. Tests for independence are

used when individuals in a single sample are classified according to two categorical variables.

2. As was the case for the hypothesis tests of earlier chapters, remember that we can

never say we have strong support for the null hypothesis. For example, if we do

not reject the null hypothesis in a chi-square test for independence, we cannot

conclude that there is convincing evidence that the variables are independent. We

can only say that we were not convinced that there is an association between the

variables.

3. Be sure that the assumptions for the chi-square test are reasonable. P-values

based on the chi-square distribution are only approximate, and if the large sample

conditions are not met, the true P-value may be quite different from the approximate one based on the chi-square distribution. This can sometimes lead to erroneous conclusions. Also, for the chi-square test of homogeneity, the assumption of independent samples is particularly important.

4. Don’t jump to conclusions about causation. Just as a strong correlation between

two numerical variables does not mean that there is a cause-and-effect relationship between them, an association between two categorical variables does not

imply a causal relationship.



EX E RC I S E S 1 2 . 3 2 - 1 2 . 3 4

12.32 The following passage is from the paper “Gender

Differences in Food Selections of Students at a Historically Black College and University” (College Student Journal [2009]: 800–806):

Also significant was the proportion of males and

their water consumption (8 oz. servings) compared

to females (X 2 5 8.166, P 5 .086). Males came

closest to meeting recommended daily water intake

(64 oz. or more) than females (29.8% vs. 20.9%).

This statement was based on carrying out a

X 2 test of independence using data in a two-way

table where rows corresponded to gender (male,

female) and columns corresponded to number of

servings of water consumed per day, with categories

none, one, two to three, four to five, and six or

more.

a. What hypotheses did the researchers test? What is

the number of degrees of freedom associated with

the report value of the X 2 statistic?

b. The researchers based their statement that the proportions falling in the water consumption categories



Bold exercises answered in back



Data set available online



were not all the same for males and females on a test

with a significance level of .10. Would they have

reached the same conclusion if a significance level of

.05 had been used? Explain.

c. The paper also included the accompanying data on

how often students said they had consumed fried

potatoes (fries or potato chips) in the past week.

Number of times consumed fried

potatoes in the past week



Gender Male

Female



0



1 to 4 to 7 to 14 to 21 or

3

6

13

20 more



2

15



10

15



15

10



12

20



6

19



3

12



Use the Minitab output on the next page to carry

out a X 2 test of independence. Do you agree with

the authors’ conclusion that there was a significant

association between gender and consumption of

fried potatoes?



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Activities



Expected counts are printed below observed counts

Chi-Square contributions are printed below expected counts

0

1-3

4-6

2

10

15

5.87 8.63 8.63

2.552 0.216 4.696

F

15

15

10

11.13 16.37 16.37

1.346 0.114 2.477

Total

17

25

25

Chi-Sq ϭ 14.153, DF ϭ 5,

M



7-13 14-20 21 or more Total

12

6

3

48

11.05 8.63

5.18

0.082 0.803

0.917

20

19

12

91

20.95 16.37

9.82

0.043 0.424

0.484

32

25

15 139

P-Value ϭ 0.015



is it reasonable to conclude that this holds for adult

Americans in general? Explain.



12.34 Using data from a national survey, the authors of

the paper “What Do Happy People Do?” (Social Indicators Research [2008]: 565–571) concluded that there

was convincing evidence of an association between

amount of time spent watching television and whether

or not a person reported that they were happy. They

observed that unhappy people tended to watch more

television. The authors write:

This could lead us to two possible interpretations:



12.33 The press release titled “Nap Time”



1. Television viewing is a pleasurable enough activity with no lasting benefit, and it pushes aside

time spent in other activities—ones that might

be less immediately pleasurable, but that would

provide long-term benefits in one’s condition.

In other words, television does cause people to

be less happy.

2. Television is a refuge for people who are already

unhappy. TV is not judgmental nor difficult, so

people with few social skills or resources for other

activities can engage in it. Furthermore, chronic

unhappiness can be socially and personally debilitating and can interfere with work and most social and personal activities, but even the unhappiest people can click a remote and be passively

entertained by a TV. In other words, the causal

order is reversed for people who watch television;

unhappiness leads to television viewing.



(pewresearch.org, July 2009) described results from a

nationally representative survey of 1488 adult Americans. The survey asked several demographic questions

(such as gender, age, and income) and also included a

question asking respondents if they had taken a nap in

the past 24 hours. The press release stated that 38% of

the men surveyed and 31% of the women surveyed reported that they had napped in the past 24 hours. For

purposes of this exercise, suppose that men and women

were equally represented in the sample.

a. Use the given information to fill in observed cell

counts for the following table:

Napped

Men

Women



Did Not Nap



Row Total

744

744



b. Use the data in the table from Part (a) to carry out a

hypothesis test to determine if there is an association

between gender and napping.

c. The press release states that more men than women

nap. Although this is true for the people in the

sample, based on the result of your test in Part (b),

Bold exercises answered in back



A C TI V I T Y 1 2 . 1



605



Using only data from this study, do you think it is

possible to determine which of these two conclusions is correct? If so, which conclusion do you

think is correct and why? If not, explain why it is

not possible to decide which conclusion is correct

based on the study data.



Data set available online



Video Solution available



Pick a Number, Any Number ...



Background: There is evidence to suggest that human

beings are not very good random number generators. In

this activity, you will investigate this phenomenon by

collecting and analyzing a set of human-generated “random” digits.

For this activity, work in a group with four or five

other students.



1. Each member of the group should complete this step

individually. Ask 25 different people to pick a digit

from 0 to 9 at random. Record the responses.

2. Combine the responses you collected with those of

the other members of your group to form a single

sample. Summarize the resulting data in a one-way

frequency table.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



606



Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests



3. If people are adept at picking digits at random, what

would you expect for the proportion of the responses

in the sample that were 0? that were 1?

4. State a null hypothesis and an alternative hypothesis

that could be tested to determine whether there is

evidence that the 10 digits from 0 to 9 are not se-



AC TI V I TY 1 2 . 2



lected an equal proportion of the time when people

are asked to pick a digit at random.

5. Carry out the appropriate hypothesis test, and write

a few sentences indicating whether or not the data

support the theory that people are not good random

number generators.



Color and Perceived Taste



Background: Does the color of a food or beverage affect

the way people perceive its taste? In this activity you will

conduct an experiment to investigate this question and

analyze the resulting data using a chi-square test.

You will need to recruit at least 30 subjects for this

experiment, so it is advisable to work in a large group

(perhaps even the entire class) to complete this activity.

Subjects for the experiment will be assigned at random to one of two groups. Each subject will be asked to

taste a sample of gelatin (for example, Jell-O) and rate

the taste as not very good, acceptable, or very good. Subjects assigned to the first group will be asked to taste and

rate a cube of lemon-flavored gelatin. Subjects in the

second group will be asked to taste and rate a cube of

lemon-flavored gelatin that has been colored an unappealing color by adding food coloring to the gelatin mix

before the gelatin sets.

Note: You may choose to use something other than

gelatin, such as lemonade. Any food or beverage whose

color can be altered using food coloring can be used. You

can experiment with the food colors to obtain a color

that you think is particularly unappealing!

1. As a class, develop a plan for collecting the data.

How will subjects be recruited? How will they be



assigned to one of the two treatment groups (unaltered color, altered color)? What extraneous variables will be directly controlled, and how will you

control them?

2. After the class is satisfied with the data collection

plan, assign members of the class to prepare the gelatin to be used in the experiment.

3. Carry out the experiment, and summarize the resulting data in a two-way table like the one shown:

Taste Rating

Treatment



Not Very

Good



Acceptable



Very

Good



Unaltered Color

Altered Color

4. The two-way table summarizes data from two independent samples (as long as subjects were assigned at

random to the two treatments, the samples are independent). Carry out an appropriate test to determine

whether the proportion for each of the three taste

rating categories is the same when the color is altered

as for when the color is not altered.



Summary of Key Concepts and Formulas

TERM OR FORMULA



COMMENT



One-way frequency table



A compact way of summarizing data on a categorical variable; it gives the number of times each of the possible categories in the data set occurs (the frequencies).



1observed cell count 2 expected cell count2 2

expected cell count

all cells



X2 5 a



X 2 goodness-of-fit test



A statistic used to provide a comparison between observed

counts and those expected when a given hypothesis is true.

When none of the expected counts are too small, X 2 has

approximately a chi-square distribution.

A hypothesis test performed to determine whether the

population category proportions are different from those

specified by a given null hypothesis.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3: Interpreting and Communicating the Results of Statistical Analyses

Tải bản đầy đủ ngay(0 tr)

×