Tải bản đầy đủ - 0 (trang)
ACTIVITY 11.2: Thinking About Data Collection

# ACTIVITY 11.2: Thinking About Data Collection

Tải bản đầy đủ - 0trang

566

Chapter 11 Comparing Two Populations or Treatments

3. Which of the two proposed designs would you recommend, and why?

4. If assigned to do so by your instructor, carry out one

of your experiments and analyze the resulting data.

Write a brief report that describes the experimental

design, includes both graphical and numerical summaries of the resulting data, and communicates the

conclusions that follow from your data analysis.

AC TI V I TY 1 1 . 3 A Meaningful Paragraph

Write a meaningful paragraph that includes the following six terms: paired samples, significantly different,

P-value, sample, population, alternative hypothesis.

A “meaningful paragraph” is a coherent piece of

writing in an appropriate context that uses all of the

listed words. The paragraph should show that you un-

derstand the meaning of the terms and their relationship

to one another. A sequence of sentences that just define

the terms is not a meaningful paragraph. When choosing

a context, think carefully about the terms you need to

use. Choosing a good context will make writing a meaningful paragraph easier.

Summary of Key Concepts and Formulas

TERM OR FORMULA

COMMENT

Independent samples

Two samples where the individuals or objects in the first

sample are selected independently from those in the second

sample.

Paired samples

Two samples for which each observation in one sample is

paired in a meaningful way with a particular observation in a

second sample.

t5

1x1 2 x22 2 hypothesized value

s 21

Å n1

1

n2

1x1 2 x22 6 1t critical value2

df 5

The test statistic for testing H0: m1 2 m2 5 hypothesized

value when the samples are independently selected and the

sample sizes are large or it is reasonable to assume that both

population distributions are normal.

s 22

1V1 1 V22 2

V 21

V 22

1

n1 2 1

n2 2 1

s 22

s 21

1

n2

Å n1

where V1 5

A formula for constructing a confidence interval for

m1 2 m2 when the samples are independently selected and

the sample sizes are large or it is reasonable to assume that

the population distributions are normal.

s 21

s2

and V2 5 2

n1

n2

The formula for determining df for the two-sample t test and

confidence interval.

xd

The sample mean difference.

sd

The standard deviation of the sample differences.

md

The mean value for the population of differences.

sd

The standard deviation for the population of differences.

xd 2 hypothesized value

sd

!n

s

xd 6 1t critical value2 d

!n

The paired t test statistic for testing

H0: md 5 hypothesized value.

t5

The paired t confidence interval formula.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Chapter Review Exercises

n1p^ 1 1 n2 p^ 2

n1 1 n2

p^ 1 2 p^ 2

z5

p^ c 11 2 p^ c2

p^ c 11 2 p^ c2

1

n1

n2

Å

p^ c is the statistic for estimating the common population proportion when p1 ϭ p2.

p^ c 5

1 p^ 1 2 p^ 22 6 1z critical value2

567

The test statistic for testing

H0: p1 Ϫ p2 ϭ 0

when the samples are independently selected and both

sample sizes are large.

p^ 2 11 2 p^ 22

p^ 1 11 2 p^ 12

1

n

n2

Å

1

A formula for constructing a confidence interval for p1 Ϫ p2

when both sample sizes are large.

Chapter Review Exercises 11.61 - 11.82

11.61 Do faculty and students have similar perceptions of

what types of behavior are inappropriate in the classroom?

This question was examined by the author of the article

“Faculty and Student Perceptions of Classroom Etiquette” (Journal of College Student Development (1998):

515–516). Each individual in a random sample of 173 students in general education classes at a large public university was asked to judge various behaviors on a scale from 1

(totally inappropriate) to 5 (totally appropriate). Individuals in a random sample of 98 faculty members also rated

the same behaviors. The mean rating for three of the behaviors studied are shown here (the means are consistent

with data provided by the author of the article). The sample standard deviations were not given, but for purposes of

this exercise, assume that they are all equal to 1.0.

Student Behavior

Wearing hats in the classroom

Addressing instructor by first name

Talking on a cell phone

Student

Mean

Rating

Faculty

Mean

Rating

2.80

2.90

1.11

3.63

2.11

1.10

a. Is there sufficient evidence to conclude that the

mean “appropriateness” score assigned to wearing a

hat in class differs for students and faculty?

b. Is there sufficient evidence to conclude that the

mean “appropriateness” score assigned to addressing

an instructor by his or her first name is higher for

students than for faculty?

c. Is there sufficient evidence to conclude that the

mean “appropriateness” score assigned to talking on

Bold exercises answered in back

Data set available online

a cell phone differs for students and faculty? Does

the result of your test imply that students and faculty

consider it acceptable to talk on a cell phone during

class?

11.62 Are girls less inclined to enroll in science courses

than boys? One study (“Intentions of Young Students

to Enroll in Science Courses in the Future: An Examination of Gender Differences,” Science Education

[1999]: 55–76) asked randomly selected fourth-, fifth-,

and sixth-graders how many science courses they intend

to take. The following data were obtained:

Males

Females

n

Mean

Standard

Deviation

203

224

3.42

2.42

1.49

1.35

Calculate a 99% confidence interval for the difference

between males and females in mean number of science

courses planned. Interpret your interval. Based on your

interval, how would you answer the question posed at

the beginning of the exercise?

11.63

A deficiency of the trace element selenium in

the diet can negatively impact growth, immunity, muscle

and neuromuscular function, and fertility. The introduction of selenium supplements to dairy cows is justified

when pastures have low selenium levels. Authors of the

paper “Effects of Short-Term Supplementation with

Selenised Yeast on Milk Production and Composition

of Lactating Cows” (Australian Journal of Dairy Technology, [2004]: 199–203) supplied the following data

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

568

Chapter 11 Comparing Two Populations or Treatments

on milk selenium concentration (mg/L) for a sample of

cows given a selenium supplement (the treatment group)

and a control sample given no supplement, both initially

and after a 9-day period.

Initial Measurement

After 9 Days

Treatment

Control

Treatment

Control

11.4

9.1

138.3

9.3

9.6

8.7

104.0

8.8

10.1

9.7

96.4

8.8

8.5

10.8

89.0

10.1

10.3

10.9

88.0

9.6

10.6

10.6

103.8

8.6

11.8

10.1

147.3

10.4

9.8

12.3

97.1

12.4

10.9

8.8

172.6

9.3

10.3

10.4

146.3

9.5

10.2

10.9

99.0

8.4

11.4

10.4

122.3

8.7

9.2

11.6

103.0

12.5

10.6

10.9

117.8

9.1

10.8

121.5

8.2

93.0

a. Use the given data for the treatment group to determine if there is sufficient evidence to conclude that

the mean selenium concentration is greater after 9

days of the selenium supplement.

b. Are the data for the cows in the control group (no

selenium supplement) consistent with the hypothesis of no significant change in mean selenium concentration over the 9-day period?

c. Would you use the paired t test to determine if there

was a significant difference in the initial mean selenium concentration for the control group and the

treatment group? Explain why or why not.

11.64

The Oregon Department of Health web site

provides information on the cost-to-charge ratio (the

percentage of billed charges that are actual costs to the

hospital). The cost-to-charge ratios for both inpatient

and outpatient care in 2002 for a sample of six hospitals

in Oregon follow.

Bold exercises answered in back

Data set available online

Hospital

2002

Inpatient

Ratio

2002

Outpatient

Ratio

1

2

3

4

5

6

68

100

71

74

100

83

54

75

53

56

74

71

Is there evidence that the mean cost-to-charge ratio for

Oregon hospitals is lower for outpatient care than for

inpatient care? Use a significance level of .05.

11.65 The article “A ‘White’ Name Found to Help in

Job Search” (Associated Press, January 15, 2003) described an experiment to investigate if it helps to have a

“white-sounding” first name when looking for a job.

Researchers sent 5000 resumes in response to ads that

appeared in the Boston Globe and Chicago Tribune. The

resumes were identical except that 2500 of them had

“white-sounding” first names, such as Brett and Emily,

whereas the other 2500 had “black-sounding” names

such as Tamika and Rasheed. Resumes of the first type

elicited 250 responses and resumes of the second type

only 167 responses. Do these data support the theory

that the proportion receiving responses is higher for

those resumes with “white-sounding first” names?

11.66 In a study of a proposed approach for diabetes

prevention, 339 people under the age of 20 who were

thought to be at high risk of developing type I diabetes

were assigned at random to two groups. One group received twice-daily injections of a low dose of insulin. The

other group (the control) did not receive any insulin, but

was closely monitored. Summary data (from the article

“Diabetes Theory Fails Test,” USA Today, June 25,

2001) follow.

Group

n

Number

Developing

Diabetes

Insulin

Control

169

170

25

24

a. Use the given data to construct a 90% confidence

interval for the difference in the proportion that

develop diabetes for the control group and the insulin group.

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

569

Chapter Review Exercises

b. Give an interpretation of the confidence interval and

the associated confidence level.

c. Based on your interval from Part (a), write a few

sentences commenting on the effectiveness of the

proposed prevention treatment.

11.67 When a surgeon repairs injuries, sutures (stitched

knots) are used to hold together and stabilize the injured

area. If these knots elongate and loosen through use, the

injury may not heal properly because the tissues would

not be optimally positioned. Researchers at the University of California, San Francisco, tied a series of different

types of knots with two types of suture material, Maxon

and Ticron. Suppose that 112 tissue specimens were

available and that for each specimen the type of knot and

suture material were randomly assigned. The investigators tested the knots to see how much the loops elongated; the elongations (in mm) were measured and the

resulting data are summarized here. For purposes of this

exercise, assume it is reasonable to regard the elongation

distributions as approximately normal.

Maxon

Types of knot

n

x

sd

Square (control)

Duncan Loop

Overhand

Roeder

Snyder

10

15

15

10

10

10.0

11.0

11.0

13.5

13.5

.1

.3

.9

.1

2.0

Ticron

Types of knot

n

x

sd

Square (control)

Duncan Loop

Overhand

Roeder

Snyder

10

11

11

10

10

2.5

10.9

8.1

5.0

8.1

.06

.40

1.00

.04

.06

a. Is there a significant difference in mean elongation

between the square knot and the Duncan loop for

b. Is there a significant difference in mean elongation

between the square knot and the Duncan loop for

c . For the Duncan loop knot, is there a significant difference in mean elongation between the Maxon and

Bold exercises answered in back

Data set available online

11.68 The article “Trial Lawyers and Testosterone:

Blue-Collar Talent in a White-Collar World” (Journal

of Applied Social Psychology [1998]: 84–94) compared

trial lawyers and nontrial lawyers on the basis of mean

testosterone level. Random samples of 35 male trial lawyers, 31 male nontrial lawyers, 13 female trial lawyers,

and 18 female nontrial lawyers were selected for study.

The article includes the following statement: “Trial lawyers had higher testosterone levels than did nontrial

lawyers. This was true for men, t(64) ϭ 3.75, p Ͻ .001,

and for women, t(29) ϭ 2.26, p Ͻ .05.”

a. Based on the information given, is the mean testosterone level for male trial lawyers significantly higher

than for male nontrial lawyers?

b. Based on the information given, is the mean testosterone level for female trial lawyers significantly

higher than for female nontrial lawyers?

c. Do you have enough information to carry out a test

to determine whether there is a significant difference

in the mean testosterone levels of male and female

trial lawyers? If so, carry out such a test. If not, what

additional information would you need to be able to

conduct the test?

11.69 In a study of memory recall, eight students from

a large psychology class were selected at random and

given 10 minutes to memorize a list of 20 nonsense

words. Each was asked to list as many of the words as he

or she could remember both 1 hour and 24 hours later.

The data are as shown in the accompanying table. Is

there evidence to suggest that the mean number of words

recalled after 1 hour exceeds the mean recall after 24

hours by more than 3? Use a level .01 test.

Subject

1 hour later

24 hour later

1

14

10

2

12

4

3

18

14

4

7

6

5

11

9

6

9

6

7

16

12

8

15

12

11.70 As part of a study to determine the effects of allowing the use of credit cards for alcohol purchases in

Canada (see “Changes in Alcohol Consumption Pat-

terns Following the Introduction of Credit Cards in

Ontario Liquor Stores,” Journal of Studies on Alcohol

[1999]: 378–382), randomly selected individuals were

given a questionnaire asking them (among other things)

how many drinks they had consumed during the previous week. A year later (after liquor stores started accepting credit cards for purchases), these same individuals

were again asked how many drinks they had consumed

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

570

Chapter 11 Comparing Two Populations or Treatments

in the previous week. The data shown are consistent with

summary statistics presented in the article.

n

96

Credit-Card

Shoppers

850

Non-CreditCard Shoppers

1994

Mean

1995

Mean

xd

sd

6.72

6.34

.38

5.52

4.09

3.97

.12

4.58

a. The standard deviations of the differences were quite

large. Explain how this could be the case.

b. Calculate a 95% confidence interval for the mean

difference in drink consumption for credit-card

shoppers between 1994 and 1995. Is there evidence

that the mean number of drinks decreased?

c. Test the hypothesis that there was no change in the

mean number of drinks between 1994 and 1995 for

the non-credit-card shoppers. Be sure to calculate

and interpret the P-value for this test.

11.71

Several methods of estimating the number of

seeds in soil samples have been developed by ecologists.

An article in the Journal of Ecology (“A Comparison

11.72 Are college students who take a freshman orientation course more or less likely to stay in college than those

who do not take such a course? The article “A Longitudi-

nal Study of the Retention and Academic Performance

of Participants in Freshmen Orientation Courses”

(Journal of College Student Development [1994]: 444–

449) reported that 50 of 94 randomly selected students

who did not participate in an orientation course returned

for a second year. Of 94 randomly selected students who

did take the orientation course, 56 returned for a second

year. Construct a 95% confidence interval for p1 Ϫ p2,

the difference in the proportion returning for students

who do not take an orientation course and those who do.

Give an interpretation of this interval.

11.73 The article “Truth and DARE: Tracking Drug Education to Graduation” (Social Problems [1994]: 448–

456) compared the drug use of 288 randomly selected

high school seniors exposed to a drug education program

(DARE) and 335 randomly selected high school seniors

who were not exposed to such a program. Data for marijuana use are given in the accompanying table. Is there evidence that the proportion using marijuana is lower for

students exposed to the DARE program? Use a ϭ .05.

of Methods for Estimating Seed Numbers in the Soil”

[1990]: 1079–1093) considered three such methods. The

accompanying data give number of seeds detected by the

direct method and by the stratified method for 27 soil

specimens.

Specimen Direct Stratified Specimen Direct Stratified

1

3

5

7

9

11

13

15

17

19

21

23

25

27

24

0

20

40

12

4

76

32

36

92

40

0

12

40

8

8

52

28

8

0

68

28

36

92

48

0

40

76

2

4

6

8

10

12

14

16

18

20

22

24

26

32

60

64

8

92

68

24

0

16

4

24

8

16

36

56

64

8

100

56

52

0

12

12

24

12

12

Do the data provide sufficient evidence to conclude that

the mean number of seeds detected differs for the two

methods? Test the relevant hypotheses using a ϭ.05.

Bold exercises answered in back

Data set available online

Exposed to DARE

Not Exposed to DARE

n

Number Who

Use Marijuana

288

335

141

181

11.74 The article “Softball Sliding Injuries” (American

Journal of Diseases of Children [1988]: 715–716) provided

a comparison of breakaway bases (designed to reduce injuries) and stationary bases. Consider the accompanying data

(which agree with summary values given in the paper).

Stationary Bases

Breakaway Bases

Number of

Games Played

Number of

Games Where a

Player Suffered

a Sliding Injury

1250

1250

90

20

Is the proportion of games with a player suffering a sliding injury significantly lower for games using breakaway

bases? Answer by performing a level .01 test. What did

you have to assume in order for your conclusion to be

valid? Do you think it is likely that this assumption was

satisfied in this study?

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Chapter Review Exercises

11.75 The positive effect of water fluoridation on dental

health is well documented. One study that validates this

is described in the article “Impact of Water Fluoridation

on Children’s Dental Health: A Controlled Study of

Two Pennsylvania Communities” (American Statistical Association Proceedings of the Social Statistics

Section [1981]: 262–265). Two communities were compared. One had adopted fluoridation in 1966, whereas

the other had no such program. Of 143 randomly selected children from the town without fluoridated water,

106 had decayed teeth, and 67 of 119 randomly selected

children from the town with fluoridated water had decayed teeth. Let p1 denote proportion of all children in

the community with fluoridated water who have decayed

teeth, and let p2 denote the analogous proportion for

children in the community with unfluoridated water.

Estimate p1 Ϫ p2 using a 90% confidence interval. Does

the interval contain 0? Interpret the interval.

11.76 Wayne Gretzky was one of ice hockey’s most prolific scorers when he played for the Edmonton Oilers.

During his last season with the Oilers, Gretzky played in

41 games and missed 17 games due to injury. The article

“The Great Gretzky” (Chance [1991]: 16–21) looked at

the number of goals scored by the Oilers in games with

and without Gretzky, as shown in the accompanying

table. If we view the 41 games with Gretzky as a random

sample of all Oiler games in which Gretzky played and

the 17 games without Gretzky as a random sample of all

Oiler games in which Gretzky did not play, is there evidence that the mean number of goals scored by the Oilers is higher for games in which Gretzky played? Use

a ϭ .01.

Games with Gretzky

Games without Gretzky

closed-mouth potato chip, and so on). We are not making this up! Summary values taken from plots given in

the article appear in the accompanying table. For purposes of this exercise, suppose that it is reasonable to regard the peak loudness distributions as approximately

normal.

Potato Chip

Open mouth

Closed mouth

Tortilla Chip

Open mouth

Closed mouth

n

x

s

10

10

63

54

13

16

10

10

60

53

15

16

a. Construct a 95% confidence interval for the difference in mean peak loudness between open-mouth

and closed-mouth chewing of potato chips. Interpret

the resulting interval.

b. For closed-mouth chewing (the recommended

method!), is there sufficient evidence to indicate that

there is a difference between potato chips and tortilla

chips with respect to mean peak loudness? Test the

relevant hypotheses using a ϭ .01.

c. The means and standard deviations given here were

actually for stale chips. When ten measurements of

peak loudness were recorded for closed-mouth chewing of fresh tortilla chips, the resulting mean and

standard deviation were 56 and 14, respectively. Is

there sufficient evidence to conclude that fresh tortilla chips are louder than stale chips? Use a ϭ .05.

11.78 Are very young infants more likely to imitate ac-

n

Sample

Mean

Sample

sd

tions that are modeled by a person or simulated by an

object? This question was the basis of a research study

summarized in the article “The Role of Person and Ob-

41

17

4.73

3.88

1.29

1.18

ject in Eliciting Early Imitation” (Journal of Experimental Child Psychology [1991]: 423–433). One action ex-

11.77 Here’s one to sink your teeth into: The authors of

the article “Analysis of Food Crushing Sounds During

Mastication: Total Sound Level Studies” (Journal of

Texture Studies [1990]: 165–178) studied the nature of

sounds generated during eating. Peak loudness (in decibels at 20 cm away) was measured for both open-mouth

and closed-mouth chewing of potato chips and of tortilla

chips. Forty subjects participated, with ten assigned at

random to each combination of conditions (such as

Bold exercises answered in back

571

Data set available online

amined was mouth opening. This action was modeled

repeatedly by either a person or a doll, and the number

of times that the infant imitated the behavior was recorded. Twenty-seven infants participated, with 12 exposed to a human model and 15 exposed to the doll.

Summary values are at the top of the following page. Is

there sufficient evidence to conclude that the mean number of imitations is higher for infants who watch a human model than for infants who watch a doll? Test the

relevant hypotheses using a .01 significance level.

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

572

Chapter 11 Comparing Two Populations or Treatments

x

s

Person Model

Doll Model

5.14

1.60

3.46

1.30

11.79 Dentists make many people nervous (even more

so than statisticians!). To see whether such nervousness

elevates blood pressure, the blood pressure and pulse

rates of 60 subjects were measured in a dental setting and

in a medical setting (“The Effect of the Dental Setting

on Blood Pressure Measurement,” American Journal

of Public Health [1983]: 1210–1214). For each subject, the

difference (dental-setting blood pressure minus medicalsetting blood pressure) was calculated. The analogous

differences were also calculated for pulse rates. Summary

data follows.

Systolic Blood Pressure

Pulse (beats/min)

Mean

Difference

Standard

Deviation of

Differences

4.47

Ϫ1.33

8.77

8.84

a. Do the data strongly suggest that true mean blood

pressure is higher in a dental setting than in a medical setting? Use a level .01 test.

b. Is there sufficient evidence to indicate that true mean

pulse rate in a dental setting differs from the true mean

pulse rate in a medical setting? Use a significance level

of .05.

11.80 Key terms in survey questions too often are not

well understood, and such ambiguity can affect responses. As an example, the article “How Unclear Terms

Is there any difference between the true proportions of

yes responses to these questions? Can a procedure from

this chapter be used to answer the question posed? If yes,

use it; if not, explain why not.

11.81 An electronic implant that stimulates the auditory

nerve has been used to restore partial hearing to a number of deaf people. In a study of implant acceptability

(Los Angeles Times, January 29, 1985), 250 adults born

deaf and 250 adults who went deaf after learning to

speak were followed for a period of time after receiving

an implant. Of those deaf from birth, 75 had removed

the implant, whereas only 25 of those who went deaf

after learning to speak had done so. Does this suggest

that the true proportion who remove the implants differs

for those who were born deaf and those who went deaf

after learning to speak? Test the relevant hypotheses using a .01 significance level.

Samples of both surface soil and subsoil were

taken from eight randomly selected agricultural locations

in a particular county. The soil samples were analyzed to

determine both surface pH and subsoil pH, with the results shown in the accompanying table.

11.82

Location

1

2

3

4

5

6

7

8

Surface pH 6.55 5.98 5.59 6.17 5.92 6.18 6.43 5.68

Subsoil pH 6.78 6.14 5.80 5.91 6.10 6.01 6.18 5.88

a. Compute a 90% confidence interval for the mean

difference between surface and subsoil pH for agricultural land in this county.

b. What assumptions are necessary for the interval in

Part (a) to be valid?

Affect Survey Data” (Public Opinion Quarterly [1992]:

218–231) described a survey in which each individual in

a sample was asked, “Do you exercise or play sports regularly?” But what constitutes exercise? The following revised question was then asked of each individual in the

same sample: “Do you do any sports or hobbies involving physical activities, or any exercise, including walking,

on a regular basis?” The resulting data are shown in the

accompanying table.

Initial Question

Revised Question

Bold exercises answered in back

Yes

No

48

60

52

40

Data set available online

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

CHAPTER

12

The Analysis of

Categorical Data

and Goodnessof-Fit Tests

It is often the case that information is collected on categorical variables, such as political affiliation, gender, or college major. As with numerical data, categorical data sets

can be univariate (consisting of observations on a single

categorical variable), bivariate (observations on two categorical variables), or even multivariate. In this chapter, we

will first consider inferential methods for analyzing univariate categorical data sets and then turn to techniques

appropriate for use with bivariate categorical data.

Greg Flume/NewSport/Corbis

Make the most of your study time by accessing everything you need to succeed

online with CourseMate.

Visit http://www.cengagebrain.com where you will find:

• An interactive eBook, which allows you to take notes, highlight, bookmark, search

the text, and use in-context glossary definitions

Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP

Video solutions to selected exercises

Data sets available for selected examples and exercises

Online quizzes

Flashcards

Videos

573

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

574

12.1

Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests

Chi-Square Tests for Univariate Data

Univariate categorical data sets arise in a variety of settings. If each student in a

sample of 100 is classified according to whether he or she is enrolled full-time or parttime, data on a categorical variable with two categories result. Each airline passenger

in a sample of 50 might be classified into one of three categories based on type of

ticket—coach, business class, or first class. Each registered voter in a sample of

100 selected from those registered in a particular city might be asked which of the five

city council members he or she favors for mayor. This would yield observations on a

categorical variable with five categories.

Univariate categorical data are most conveniently summarized in a one-way frequency table. For example, the article “Fees Keeping American Taxpayers From

Using Credit Cards to Make Tax Payments” (IPSOS Insight, March 24, 2006) surveyed American taxpayers regarding their intent to pay taxes with a credit card. Suppose that 100 randomly selected taxpayers participated in such a survey, with possible

responses being definitely will use a credit card to pay taxes next year, probably will

use a credit card, probably won’t use a credit card, and definitely won’t use a credit

card. The first few observations might be

Probably will

Definitely will not

Probably will not

Probably will not

Definitely will

Definitely will not

Counting the number of observations of each type might then result in the following

one-way table:

Outcome

Frequency

Definitely

Will

Probably

Will

Probably

Will Not

Definitely

Will Not

14

12

24

50

For a categorical variable with k possible values (k different levels or categories),

sample data are summarized in a one-way frequency table consisting of k cells, which

may be displayed either horizontally or vertically.

In this section, we consider testing hypotheses about the proportion of the population that falls into each of the possible categories. For example, the manager of a tax

preparation company might be interested in determining whether the four possible responses to the tax credit card question occur equally often. If this is indeed the case, the

long-run proportion of responses falling into each of the four categories is 1/4, or .25.

The test procedure to be presented shortly would allow the manager to decide whether

the hypothesis that all four category proportions are equal to .25 is plausible.

Notation

k ϭ number of categories of a categorical variable

p1 ϭ true proportion for Category 1

p2 ϭ true proportion for Category 2

(

pk ϭ true proportion for Category k

(Note: p1 1 p2 1 % 1 pk 5 1)

(continued)

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

12.1 Chi-Square Tests for Univariate Data

575

The hypotheses to be tested have the form

H0: p1 ϭ hypothesized proportion for Category 1

p2 ϭ hypothesized proportion for Category 2

(

pk ϭ hypothesized proportion for Category k

Ha: H0 is not true, so at least one of the true category proportions differs

from the corresponding hypothesized value.

For the example involving responses to the tax survey, let

p1 ϭ the proportion of all taxpayers who will definitely pay by credit card

p2 ϭ the proportion of all taxpayers who will probably pay by credit card

p3 ϭ the proportion of all taxpayers who will probably not pay by credit card

and

p4 ϭ the proportion of all taxpayers who will definitely not pay by credit card

The null hypothesis of interest is then

H0: p1 ϭ .25, p2 ϭ .25, p3 ϭ .25, p4 ϭ .25

A null hypothesis of the type just described can be tested by first selecting a random

sample of size n and then classifying each sample response into one of the k possible

categories. To decide whether the sample data are compatible with the null hypothesis,

we compare the observed cell counts (frequencies) to the cell counts that would have

been expected when the null hypothesis is true. The expected cell counts are

Expected cell count for Category 1 ϭ np1

Expected cell count for Category 2 ϭ np2

and so on. The expected cell counts when H0 is true result from substituting the corresponding hypothesized proportion for each pi.

Anthony Ise/PhotoDisc/Getty Images//

Cengage Learning/Getty Images

E X A M P L E 1 2 . 1 Births and the Lunar Cycle

A common urban legend is that more babies than expected are born during certain

phases of the lunar cycle, especially near the full moon. The paper “The Effect of the

Lunar Cycle on Frequency of Births and Birth Complications” (American Journal

of Obstetrics and Gynecology [2005]: 1462–1464) classified births according to the

lunar cycle. Data for a sample of randomly selected births occurring during 24 lunar

Lunar Phase

New moon

Waxing crescent

First quarter

Waxing gibbous

Full moon

Waning gibbous

Last quarter

Waning crescent

Number

of Days

Number

of Births

24

152

24

149

24

150

24

152

7,680

48,442

7,579

47,814

7,711

47,595

7,733

48,230

Data set available online

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

ACTIVITY 11.2: Thinking About Data Collection

Tải bản đầy đủ ngay(0 tr)

×