3: Interpreting and Communicating the Results of Statistical Analyses
Tải bản đầy đủ - 0trang
602
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests
It is also a good idea to include a table of observed and expected counts in addition to reporting the computed value of the test statistic and the P-value. And finally,
make sure to give a conclusion in context, and make sure that the conclusion is
worded appropriately for the type of test conducted. For example, don’t use terms
such as independence and association to describe the conclusion if the test performed
was a test for homogeneity.
Interpreting the Results of Statistical Analyses
As with the other hypothesis tests considered, it is common to find the result of a
chi-square test summarized by giving the value of the chi-square test statistic and an
associated P-value. Because categorical data can be summarized compactly in frequency tables, the data often are given in the article (unlike data for numerical variables, which are rarely given).
What to Look For in Published Data
Here are some questions to consider when you are reading an article that contains the
results of a chi-square test:
• Are the variables of interest categorical rather than numerical?
• Are the data given in the article in the form of a frequency table?
• If a two-way frequency table is involved, is the question of interest one of homo-
geneity or one of independence?
• What null hypothesis is being tested? Are the results of the analysis reported in
the correct context (homogeneity, etc.)?
• Is the sample size large enough to make use of a chi-square test reasonable? (Are
all expected counts at least 5?)
• What is the value of the test statistic? Is the associated P-value given? Should the
null hypothesis be rejected?
• Are the conclusions drawn by the authors consistent with the results of the test?
• How different are the observed and expected counts? Does the result have practi-
cal significance as well as statistical significance?
The authors of the article “Predicting Professional Sports Game Outcomes from
Intermediate Game Scores” (Chance [1992]: 18–22) used a chi-square test to determine whether there was any merit to the idea that basketball games are not settled
until the last quarter, whereas baseball games are over by the seventh inning. They
also considered football and hockey. Data were collected for 189 basketball games,
92 baseball games, 80 hockey games, and 93 football games. The analyzed games were
sampled randomly from all games played during the 1990 season for baseball and football and for the 1990–1991 season for basketball and hockey. For each game, the lategame leader was determined, and then it was noted whether the late-game leader actually
ended up winning the game. The resulting data are summarized in the following table:
Sport
Basketball
Baseball
Hockey
Football
Late-Game
Leader Wins
Late-Game
Leader Loses
150
86
65
72
39
6
15
21
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12.3 Interpreting and Communicating the Results of Statistical Analyses
603
The authors stated that the “late-game leader is defined as the team that is ahead
after three quarters in basketball and football, two periods in hockey, and seven
innings in baseball. The chi-square value (with three degrees of freedom) is 10.52
(P Ͻ .015).” They also concluded that “the sports of basketball, hockey, and football
have remarkably similar percentages of late-game reversals, ranging from 18.8% to
22.6%. The sport that is an anomaly is baseball. Only 6.5% of baseball games resulted in late reversals. . . . [The chi-square test] is statistically significant due almost
entirely to baseball.”
In this particular analysis, the authors are comparing four populations (games from
each of the four sports) on the basis of a categorical variable with two categories (lategame leader wins and late-game leader loses). The appropriate null hypothesis is then
H0: The population proportion in each category (leader wins, leader loses) is the
same for all four sports.
Based on the reported value of the chi-square statistic and the associated P-value, this
null hypothesis is rejected, leading to the conclusion that the category proportions are
not the same for all four sports.
The validity of the chi-square test requires that the sample sizes be large enough so
that no expected counts are less than 5. Is this reasonable here? The following Minitab
output shows the expected cell counts and the computation of the X 2 statistic:
Chi-Square Test
Expected counts are printed below observed counts
Leader W
Leader L
150
39
155.28
33.72
2
86
6
75.59
16.41
3
65
15
65.73
14.27
4
72
21
76.41
16.59
Total
373
81
Chi-Sq = 0.180 + 0.827 +
1.435 + 6.607 +
0.008 + 0.037 +
0.254 + 1.171 = 10.518
DF = 3, P-Value = 0.015
1
Total
189
92
80
93
454
The smallest expected count is 14.27, so the sample sizes are large enough to
justify the use of the X 2 test. Note also that the two cells in the table that correspond
to baseball contribute a total of 1.435 ϩ 6.607 ϭ 8.042 to the value of the X 2 statistic
of 10.518. This is due to the large discrepancies between the observed and expected
counts for these two cells. There is reasonable agreement between the observed and
the expected counts in the other cells. This is probably the basis for the authors’ conclusion that baseball is the anomaly and that the other sports were similar.
A Word to the Wise: Cautions and Limitations
Be sure to keep the following in mind when analyzing categorical data using one of
the chi-square tests presented in this chapter:
1. Don’t confuse tests for homogeneity with tests for independence. The hypotheses
and conclusions are different for the two types of test. Tests for homogeneity
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
604
Chapter 12
The Analysis of Categorical Data and Goodness-of-Fit Tests
are used when the individuals in each of two or more independent samples are
classified according to a single categorical variable. Tests for independence are
used when individuals in a single sample are classified according to two categorical variables.
2. As was the case for the hypothesis tests of earlier chapters, remember that we can
never say we have strong support for the null hypothesis. For example, if we do
not reject the null hypothesis in a chi-square test for independence, we cannot
conclude that there is convincing evidence that the variables are independent. We
can only say that we were not convinced that there is an association between the
variables.
3. Be sure that the assumptions for the chi-square test are reasonable. P-values
based on the chi-square distribution are only approximate, and if the large sample
conditions are not met, the true P-value may be quite different from the approximate one based on the chi-square distribution. This can sometimes lead to erroneous conclusions. Also, for the chi-square test of homogeneity, the assumption of independent samples is particularly important.
4. Don’t jump to conclusions about causation. Just as a strong correlation between
two numerical variables does not mean that there is a cause-and-effect relationship between them, an association between two categorical variables does not
imply a causal relationship.
EX E RC I S E S 1 2 . 3 2 - 1 2 . 3 4
12.32 The following passage is from the paper “Gender
Differences in Food Selections of Students at a Historically Black College and University” (College Student Journal [2009]: 800–806):
Also significant was the proportion of males and
their water consumption (8 oz. servings) compared
to females (X 2 5 8.166, P 5 .086). Males came
closest to meeting recommended daily water intake
(64 oz. or more) than females (29.8% vs. 20.9%).
This statement was based on carrying out a
X 2 test of independence using data in a two-way
table where rows corresponded to gender (male,
female) and columns corresponded to number of
servings of water consumed per day, with categories
none, one, two to three, four to five, and six or
more.
a. What hypotheses did the researchers test? What is
the number of degrees of freedom associated with
the report value of the X 2 statistic?
b. The researchers based their statement that the proportions falling in the water consumption categories
Bold exercises answered in back
Data set available online
were not all the same for males and females on a test
with a significance level of .10. Would they have
reached the same conclusion if a significance level of
.05 had been used? Explain.
c. The paper also included the accompanying data on
how often students said they had consumed fried
potatoes (fries or potato chips) in the past week.
Number of times consumed fried
potatoes in the past week
Gender Male
Female
0
1 to 4 to 7 to 14 to 21 or
3
6
13
20 more
2
15
10
15
15
10
12
20
6
19
3
12
Use the Minitab output on the next page to carry
out a X 2 test of independence. Do you agree with
the authors’ conclusion that there was a significant
association between gender and consumption of
fried potatoes?
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Activities
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
0
1-3
4-6
2
10
15
5.87 8.63 8.63
2.552 0.216 4.696
F
15
15
10
11.13 16.37 16.37
1.346 0.114 2.477
Total
17
25
25
Chi-Sq ϭ 14.153, DF ϭ 5,
M
7-13 14-20 21 or more Total
12
6
3
48
11.05 8.63
5.18
0.082 0.803
0.917
20
19
12
91
20.95 16.37
9.82
0.043 0.424
0.484
32
25
15 139
P-Value ϭ 0.015
is it reasonable to conclude that this holds for adult
Americans in general? Explain.
12.34 Using data from a national survey, the authors of
the paper “What Do Happy People Do?” (Social Indicators Research [2008]: 565–571) concluded that there
was convincing evidence of an association between
amount of time spent watching television and whether
or not a person reported that they were happy. They
observed that unhappy people tended to watch more
television. The authors write:
This could lead us to two possible interpretations:
12.33 The press release titled “Nap Time”
1. Television viewing is a pleasurable enough activity with no lasting benefit, and it pushes aside
time spent in other activities—ones that might
be less immediately pleasurable, but that would
provide long-term benefits in one’s condition.
In other words, television does cause people to
be less happy.
2. Television is a refuge for people who are already
unhappy. TV is not judgmental nor difficult, so
people with few social skills or resources for other
activities can engage in it. Furthermore, chronic
unhappiness can be socially and personally debilitating and can interfere with work and most social and personal activities, but even the unhappiest people can click a remote and be passively
entertained by a TV. In other words, the causal
order is reversed for people who watch television;
unhappiness leads to television viewing.
(pewresearch.org, July 2009) described results from a
nationally representative survey of 1488 adult Americans. The survey asked several demographic questions
(such as gender, age, and income) and also included a
question asking respondents if they had taken a nap in
the past 24 hours. The press release stated that 38% of
the men surveyed and 31% of the women surveyed reported that they had napped in the past 24 hours. For
purposes of this exercise, suppose that men and women
were equally represented in the sample.
a. Use the given information to fill in observed cell
counts for the following table:
Napped
Men
Women
Did Not Nap
Row Total
744
744
b. Use the data in the table from Part (a) to carry out a
hypothesis test to determine if there is an association
between gender and napping.
c. The press release states that more men than women
nap. Although this is true for the people in the
sample, based on the result of your test in Part (b),
Bold exercises answered in back
A C TI V I T Y 1 2 . 1
605
Using only data from this study, do you think it is
possible to determine which of these two conclusions is correct? If so, which conclusion do you
think is correct and why? If not, explain why it is
not possible to decide which conclusion is correct
based on the study data.
Data set available online
Video Solution available
Pick a Number, Any Number ...
Background: There is evidence to suggest that human
beings are not very good random number generators. In
this activity, you will investigate this phenomenon by
collecting and analyzing a set of human-generated “random” digits.
For this activity, work in a group with four or five
other students.
1. Each member of the group should complete this step
individually. Ask 25 different people to pick a digit
from 0 to 9 at random. Record the responses.
2. Combine the responses you collected with those of
the other members of your group to form a single
sample. Summarize the resulting data in a one-way
frequency table.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
606
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests
3. If people are adept at picking digits at random, what
would you expect for the proportion of the responses
in the sample that were 0? that were 1?
4. State a null hypothesis and an alternative hypothesis
that could be tested to determine whether there is
evidence that the 10 digits from 0 to 9 are not se-
AC TI V I TY 1 2 . 2
lected an equal proportion of the time when people
are asked to pick a digit at random.
5. Carry out the appropriate hypothesis test, and write
a few sentences indicating whether or not the data
support the theory that people are not good random
number generators.
Color and Perceived Taste
Background: Does the color of a food or beverage affect
the way people perceive its taste? In this activity you will
conduct an experiment to investigate this question and
analyze the resulting data using a chi-square test.
You will need to recruit at least 30 subjects for this
experiment, so it is advisable to work in a large group
(perhaps even the entire class) to complete this activity.
Subjects for the experiment will be assigned at random to one of two groups. Each subject will be asked to
taste a sample of gelatin (for example, Jell-O) and rate
the taste as not very good, acceptable, or very good. Subjects assigned to the first group will be asked to taste and
rate a cube of lemon-flavored gelatin. Subjects in the
second group will be asked to taste and rate a cube of
lemon-flavored gelatin that has been colored an unappealing color by adding food coloring to the gelatin mix
before the gelatin sets.
Note: You may choose to use something other than
gelatin, such as lemonade. Any food or beverage whose
color can be altered using food coloring can be used. You
can experiment with the food colors to obtain a color
that you think is particularly unappealing!
1. As a class, develop a plan for collecting the data.
How will subjects be recruited? How will they be
assigned to one of the two treatment groups (unaltered color, altered color)? What extraneous variables will be directly controlled, and how will you
control them?
2. After the class is satisfied with the data collection
plan, assign members of the class to prepare the gelatin to be used in the experiment.
3. Carry out the experiment, and summarize the resulting data in a two-way table like the one shown:
Taste Rating
Treatment
Not Very
Good
Acceptable
Very
Good
Unaltered Color
Altered Color
4. The two-way table summarizes data from two independent samples (as long as subjects were assigned at
random to the two treatments, the samples are independent). Carry out an appropriate test to determine
whether the proportion for each of the three taste
rating categories is the same when the color is altered
as for when the color is not altered.
Summary of Key Concepts and Formulas
TERM OR FORMULA
COMMENT
One-way frequency table
A compact way of summarizing data on a categorical variable; it gives the number of times each of the possible categories in the data set occurs (the frequencies).
1observed cell count 2 expected cell count2 2
expected cell count
all cells
X2 5 a
X 2 goodness-of-fit test
A statistic used to provide a comparison between observed
counts and those expected when a given hypothesis is true.
When none of the expected counts are too small, X 2 has
approximately a chi-square distribution.
A hypothesis test performed to determine whether the
population category proportions are different from those
specified by a given null hypothesis.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.