Tải bản đầy đủ - 0 (trang)
6 Skillbuilder Applet: The Confidence Level in Action

6 Skillbuilder Applet: The Confidence Level in Action

Tải bản đầy đủ - 0trang

Index

Codes of ethics, S5-2, S5-3, S5-17

Coefficient(s)

correlation. See Correlation

of determination, multiple, S3-8

regression, S3-3

estimating, using sample data, S3-5

testing null and alternative hypotheses for, S3-9–10

Coherent probabilities, 236

Coincidences, 264 – 67

Column percentages, 194, 195, 196, 197,

636, 639

Complement, 239, 243, 252

rule to find probability of, 243– 44,

250, S1-8

Complementary events, 239, 240, 243

Completely randomized design, 131,

132

Complexity of survey questions, 97

Computer, reading results for regression

on, 170 –71

Computer-assisted information collection, S5-10 –11

Computer-assisted self-interviews,

98–99

Computer-assisted telephone interviewing system (CATI), 89

Concepts in survey questions, illdefined, 101

Conclusions, making stronger or weaker

than justified, S5-19

Conditional percentages, 194, 196

Conditional probability, 241– 42, 243,

246

defined, 242

determining, 247– 49, 251, 252

p-value, 501, 502

tree diagram and, 256 –58

type 1 or type 2 errors, 507– 8

Confidence interval(s), 76 –79, 206 –7,

332 –33, 401–93, S5-16

in action, example of, 480 – 82

approximate 95%, 460 – 61, 472

defined, 460

for difference in population means,

472

for large samples, 460 – 61

for proportion p, 420 –21

case studies, 429–31, 478–79

curiosity and, 401–2

decisions and, 428–29

defined, 332, 401, 403, 405, 443

for difference between two population means, 466 –77

equal variance assumption of,

473–76

general format for, 467

interpreting, 470 –72

pooled, 473– 475, 477

pooled standard error for, 473–76

pooled vs. unpooled, 475–76



unpooled, 466 –73, 477

valid situations for, 468–70

for difference between two population proportions, 423–28

conditions for, 424 –28

decision-making and, 428–29

formula, 424

notation for, 424

estimation situations, examples of

different, 444 – 45

examples of, 401–2

general format for determining, 444,

452

general formula for, 408

hypothesis testing versus, choosing,

577–78

independent samples, 446 – 47

information learned from, 577–78

interpreting, 78–79

multiplier in. See Multiplier in

confidence interval

for one population mean (t-interval),

452 – 66

calculating, 452 –53, 455

conditions required for using,

453–56

interpreting, 457–58

t * multiplier, determining, 450 –52,

453, 460

understanding, 459

valid situations for, 454

for one proportion, 409–23

approximate 95%, 420 –21

common settings, 411

computing, 412 –14

conditions required for using,

414 –16

conservative 95%, 421

developing 95%, 417–18

formula for, 409, 414

intuitive explanation of 95%

confidence, 418

width of, 409–10

z * multiplier, determining, 412 –14

for other levels of confidence, 418–19

for population mean E(Y ), in regression, 617, 618, S3-13–14

for population means, in one-way

analysis of variance, 684

for population mean of paired differences (paired t-interval), 461– 66

for population slope, in regression,

611–12

prediction interval compared to, 614

procedures, summary of, 483

for regression coefficient, S3-10

significance testing and, 574 –76

one-sided tests, 575–76

tests with two-sided alternatives,

574 –75

standard errors, 447–50



755



summary of formulas for, 431–32

as supporting analysis for chi-square

test, 645

understanding any, 478

Confidence level, 405, 406 –7, 418–19

in action, example of, 480 – 82

defined, 405, 406, 443

family, 677

interpreting, 406 –7

as interval estimate, 405

margin of error associated with,

for 95% confidence level, 418,

419–23

multiplier and, 408, 412 –14

Confidentiality

in medical research, S5-6

of survey, 98–99

Confounding variable(s), 5, 120 –23,

706 –7

case-control studies and reduction of

potential, 135

causation and, 133–37

defined, 120

interpreting observed associations

and, 177

lurking variables, 120 –22, 123

relationships between categorical

variables and, 204 – 6

retrospective studies and, 140

ruling out potential, 708

Simpson’s Paradox and, 205– 6

Confusion of the inverse, 261– 63

Conservative margin of error, 76, 420,

421–23

Conservative 95% confidence interval

for proportion p, 421

Constant variance in regression, 603

Consumer Reports survey, 91

Contingency table, 194, 635. See also

Two-way (contingency) table

assessing statistical significance of,

206 –16, 654

chi-square test for, 208–16, 635– 46

relationship between categorical variables displayed in, 194 –97

Continuity correction, 314

Continuous random variables, 281, 282,

300 –311

defined, 280, 300

normal random variable, 302 –11

cumulative probability for, 305–9

defined, 302

independent, linear combinations

of, 316 –18

percentile ranking, finding, 310 –11

probability relationships for, 306 –7

standard, 304

standardized score (z-scores),

54 –55, 303–9

symmetry property of, 307



756



Index



Continuous random variables

(continued )

probability density function for,

300 –302, 346

sign test and, S2-3

uniform random variable, 301

Continuous variable, 16

Contributions to chi-square, 645

Control groups, 127–28

Control of societal risks, using statistics

for, 709, 713–15

Convenience sample, 93–94, 138

Correlation, 165– 80

causation vs., 176 –77

defined, 151, 165

direction of relationship and, 165–71,

178–79

examples of, 166 – 69

formula for, 166

interpreting correlation coefficient,

166 –70

misleading, reasons for, 171–76

curvilinear data, 174 –75

extrapolation, 171–72

inappropriately combined groups,

173–74

outliers, 172 –73, 178, 179

multiple squared, S3-8

negative, 166, 167, 178

perfect, 166

positive, 166, 167, 178

squared (r 2 ), 169–70, 171

strength of relationship and, 165–71,

178–79

testing hypotheses about correlation

coefficient, 612

of zero, 166

Correlation coefficient. See Correlation

Critical value. See also Rejection region

defined, 559

for chi-square tests, 643– 45

for F-tests, 675–76

for t-tests, 559– 60, 564, 568, 569–70

Cumulative distribution function (cdf )

of discrete random variables,

286

Cumulative probability(ies), 286,

304 –14

for any normal random variable,

305– 6

for binomial random variable, approximating, 312 –14

percentile rank and, 310 –11

for z-scores, finding, 304 –5

Curvilinear data, impact on correlation or regression results,

174 –75

Curvilinear patterns in scatterplots,

154 –55

Curvilinear relationships, 154, 600



D

Data, 2, 13–15. See also Paired data

(paired samples); Sampling

assurance of data quality, S5-9–13

experimenter effects and personal

bias and, S5-12 –13

U.S. federal statistical agencies

guidelines, S5-9–12

categorical, estimating probabilities

from observed, 234 –35

defined, 3

five-number summary of, 2 –3, 25, 26,

43– 46

quantitative, visual displays exploring

features of, 24 –36

types of, 14 –15

population data, 14 –15

raw data, 13–15, 16, 36

sample data, 14 –15

using, for inference, 705– 8

Dataset, 14

bimodal, 33, 34

skewed, 32, 33, 40, 41

unimodal, 33

Data-snooping, S5-15

Dawes, Robyn, 262

Deception in research, S5-3– 4. See also

Ethics

Decision-making

appropriate statistical analyses,

S5-14 –16

confidence intervals to guide, 428–29

personal, 709–12

policy decisions, statistical studies to

guide, 713–15

Degrees of freedom (df ), 212, 370 –71,

372, 450

for chi-square distribution, 641, 642

for chi-square statistic, 641, 643

denominator, 674, 675, 676, S3-11

for F-distribution, 674 –75

for goodness-of-fit test, 653

for multiple regression, S3-11

numerator, 674, 675, 676, S3-11

for one-sample t-statistic, 555, 556

for one-way ANOVA, 674 –75, 685

rejection region approach for t-tests

and, 559, 560, 564

for t-distribution, 450, 451, 453

for two-sample t-statistic, 566 – 67,

569

for two-way ANOVA, S4-12

Welch’s approximation, 468, 567

Deliberate bias, 95–96

Denominator degrees of freedom, 674,

675, 676, S3-11

Density function, probability, 300 –302,

346

Dependent events, 240 – 41, 243

general multiplication rule for probability of, 245, 246, 251



Dependent samples, 363

Dependent variable, 119, 152, 600, S3-2.

See also Response variable(s)

Descriptive statistics, 15, 71

Design, research study

observational study, 133–36

randomized experiment, 124 –33

terminology and examples, 131–33

Desire to please, survey respondents

and, 96 –97

Determination, multiple coefficient of,

S3-8

Deterministic relationship, 151, 160

Deviation(s) from regression line

in multiple linear regression model,

S3-3

in population, 603–5

in sample, 602

Diaconis, Persi, 264

Difference between two population proportions, 340

chi-square test vs. z-test for, 647– 49

confidence interval for, 423–28

hypothesis testing, 340, 527–33

Difference in two population means,

342 – 43

confidence intervals for, 466 –77

hypothesis testing, 342 – 43, 552,

565–74

Difference of random variables, 315

independent normal random variables, 316 –18

Direction of relationship, correlation

and, 165–71, 178–79

Discrete random variables, 280, 282,

283–93, S1-2 –15

binomial random variables, 281,

294 –300, 349

binomial experiments and, 294 –95,

339, 348– 49

cumulative probabilities of, approximating, 312 –14

expected value (mean) of, 298–99,

S1-6

independent, combining, 318–20

Poisson approximation for, S1-11

probabilities for, 295–98

standard deviation of, 298–99

variance of, 299

cumulative distribution function

(cdf ) of, 286

defined, 280, S1-2

expected value for, 288–91

hypergeometric, 295, S1-2 –7

hypotheses for, S2-7

multinomial distribution, S1-11–13

Poisson, S1-7–11

probabilities for complex events

based on, 287– 88

probability distribution of, 283– 88,

S1-3



Index

probability notation for, 283

sign test and, S2-3

standard deviation of, 291–92

variance of, 291–92

Disjoint events, 240, 243

Distribution

binomial, 296, S1-12

normal approximation to, 311–14

chi-square, 641

F, 670, 674 –75, S3-11

frequency, 20

hypergeometric, 650, S1-2 –7

multinomial, S1-11–13

normal, 49, 50, 55, 302

rejection region on, 524

sampling distribution as approximately, 347, 355, 364, 366 – 67

standard, 304 – 6, 370, 371

z-scores to solve probability problem about, 305–9

of a quantitative variable, 26

Poisson, S1-7–11

probability. See Probability

distribution

relative frequency, 20

sampling. See Sampling distribution

t (Student’s t), 370 –73, 450 –52

Dotplot, 2, 25, 29

creating, 31, 33

interpreting, 29–30

strengths and weaknesses, 35–36

Double-blind experiment, 128, 129,

S5-12

Double dummy, 128–29

Dunn, E.V., 425



E

e, value of, S1-8

Ecological validity, 139– 40, S5-19

Effect modifier, 147

Effect size, 580 – 85

comparing, across studies, 583– 84

estimating, for one and two samples,

581

interpreting, 582 – 83

power and, 584 – 85

t-statistic and, 581– 82

Emotions, measuring, 101

Empirical Rule, 52 –57, 279, 303, 448

in action, example of, 56 –57

applied to z-scores, 54 –55

defined, 55

95% confidence interval and, 417–18

range and standard deviation and, 54

Equal variance assumption, 473–76, 567

Error(s), 504 –10

least squares line and, 163

margin of. See Margin of error

mean square (MSE), 681, S4-12

outliers caused by, 47, 48

prediction, 162 – 63



SSE (sum of squared error), 163, 164,

170, 606, 608, 609, 680 – 82, S3-5,

S3-8, S4-11–12

standard. See Standard error(s)

types 1 and 2 in hypothesis testing,

506 –10, 564n, 720

consequences of, in personal

decision-making, 710

and their probabilities, 506 –10

trade-off between, 510

Error term (P), S4-4

Estimated y, 160

Estimate (sample estimate, sample

statistic, point estimate), 332,

333, 404, 405, 408, 444. See also

Statistic

Estimation. See also Confidence

interval(s)

language and notation of, 403– 4

margin of error in, 75–79, 405

for 95% confidence, 418, 419–23

Estimation situations, examples of different, 336 –37, 444 – 45

Ethical Guidelines for Statistical Practice

(American Statistical Association), S5-2, S5-17

Ethical Principles of Psychologists and

Code of Conduct (APA)

on animal research, S5-9

on deception in research, S5-3– 4

on informed consent, S5-4

Ethics, S5-2 –22

in animal research, S5-8–9

appropriate statistical analyses,

S5-14 –16

assurance of data quality, S5-9–13

experimenter effects and personal

bias and, S5-12 –13

U.S. federal statistical agencies

guidelines, S5-9–12

case study, S5-19–20

codes of, S5-2, S5-3, S5-17

fair reporting of results, S5-16 –20

multiple hypothesis tests and

selective reporting, S5-17–18

sample size and statistical significance, S5-17

stronger or weaker conclusions

than justified, S5-19

observational study vs. randomized

experiment, 118

treatment of human participants,

S5-2 – 8, S5-12

example of unethical experiment,

S5-3– 4

informed consent and, S5-4 – 8

Event(s)

complementary, 239, 243

defined, 238, 243

dependent, 240 – 41, 243, 245, 246,

251



757



independent, 240 – 41, 243, 246 – 47,

249, 251, 252

mutually exclusive, 240, 243, 244 – 45,

249, 250, 252

simple, 238, 238–39, 242

Exact p-values in tests for population

proportions, 521–23

Excel commands

for binomial probabilities, 297

for hypergeometric probabilities,

S1-5

for minimum value, 46

normal distribution function, 55

for percentiles, 46

for Poisson probabilities, S1-9

for p-value, 212, 517, 518, 558

of chi-square test, 642, 655

for F-test, 675

for paired t-test, 563

for quartiles, 46

for regression equation, 162

for standard deviation and variance,

52

for t * multiplier, 452

for t-value corresponding to specified

area, 373

Expected counts, 208–9

for goodness-of-fit test, 653

for two-way tables, 638– 40, 646

interpreting and calculating,

639– 40

Expected value, 289

of binomial random variables,

298–99, S1-6

calculating, 289–91

of discrete random variables, 288–91

binomial random variables,

298–99, S1-6

notation and formula for, 289

for population, 292 –93

Experiment(s), 118, 124. See also Ethics

binomial, 294 –95, 339, 348– 49,

521–23

designing, 124 –33

difficulties and disasters in, 136 – 41

double-blind, 128, 129, S5-12

multinomial, S1-12

randomized. See Randomized

experiment(s)

Experimental unit, 119

number in blocks, 133

Experimenter effects, 139, S5-12 –13

Explanatory variable(s), 18, 119, 152

in case-control studies, 134, 135, 136

for categorical variables, 21

change in, caused by response variable, 177

in contingency table, 194, 195, 196

proportion of variation explained by

(R 2 ), 169, 607–9, S3-8

in regression models, 600



758



Index



Explanatory variable(s) (continued )

roles in research studies, 119, 120,

121, 122, 123

in two-way ANOVA of, 689–90, 692

Exponential function, S1-8

Extension of results inappropriately to

population, 137–38

Extrapolation, 171–72



F

Factors in two-way ANOVA, S4-5

Factor A, S4-5, S4-7, S4-8, S4-10

population mean for, level i, S4-5

Factor B, S4-5, S4-7, S4-8, S4-10

population mean for, level j, S4-5

Fair reporting of research results,

S5-16 –20

False negative, 505, 506 –10. See also

type 2 error

False positive, 505, 506 –10. See also

type 1 error

Family confidence level, 677

Family of random variables, 280, 281

Family type 1 error rate, 677

F-distribution, 670, 674 –75

family of, 674 –75

in multiple regression, S3-11

Federal Register Notices, S5-9

Federal Statistical Organizations’ Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Disseminated

Information, S5-10

Financial pressures on research results,

S5-19

Fisher’s Exact Test, 650 –52

Fisher’s multiple comparison procedure, 677–78

Fit, 615

residuals versus fits, S3-15–16

Five-number summary, 2 –3, 25, 26,

42 – 43

outlier detection and, 43– 46

Frequency, 20

relative, 20, 30, 259– 61

Frequency distribution, 20

F-test for comparing means, 669–79,

685

assumptions and necessary conditions for, 672 –74

defined, 670

family of F-distributions, 674 –75

F-statistic, 670, 672, 675–76, 679, 680,

681, S3-10, S4-13

multiple comparisons, 676 –79

notation for summary statistics, 672

p-value, 670, 675–76

steps in, 685

two-way ANOVA and, 692 –93,

S4-12 –13



examples and interpretation of

results, S4-14 –19

Full model for two-way ANOVA, S4-7–9

defined, S4-7

for each observation, S4-7– 8

Fundamental Rule for Using Data for

Inference, 72, 89, 125, 137, 404,

706

confidence interval for population

proportion and, 414 –15



G

Gallup, George, 95

Gallup Organization, 719

Galton, Francis, 165

Gambler’s fallacy, 267

Gehring, William, 267

Generalizability, 139– 40

General Social Survey (GSS), 91–92

General two-way ANOVA model, S4-6 –7

Goodness-of-fit test, chi-square, 652 –56

Groups

inappropriately combining, 173–74

mean square for (MS Groups), 680,

682



H

Haphazard (convenience) sample,

93–94, 138

Hawthorne effect, 139

Histogram, 29

applicability of Empirical Rule and

shape of, 56, 57

creating, 30 –31, 33

interpreting, 29–30

normal curve superimposed on, 50

of residuals, 621

of sample means, 345– 46

strengths and weaknesses, 35, 36

Homogeneity, chi-square test for, 638

Human participants, ethical treatment

of, S5-2 – 8, S5-12

example of unethical experiment,

S5-3– 4

informed consent and, S5-4 – 8

Hypergeometric distribution, 650,

S1-2 –7

Hypergeometric probability distribution, S1-3–5

Hypergeometric random variable, 295,

S1-2 –7

defined, S1-2

mean, variance, and standard deviation for, S1-6 –7

probability distribution function for,

S1-3–5

survey and poll results as, S1-3

Hypothesis. See Alternative hypothesis;

Null hypothesis

Hypothesis testing, 206, 333, 495–597.

See also Significance testing



confidence interval versus, choosing,

577–78

about correlation coefficient, 612

about difference in two population

means, 342 – 43, 552, 565–74

about difference in two population

proportions, 340, 527–33

p-value for, 529, 531–32

errors in, types of, 504 –10, 564n, 710,

720

formulating hypothesis statements,

496 –99

information learned from, 577–78

logic of, 495, 499–500

multiple, 586

nonparametric tests. See Nonparametric tests of hypotheses

about one categorical variable,

652 –56

chi-square goodness-of-fit test,

652 –56

about one population mean, 551,

553– 61

paired data, 341– 42, 552, 561– 64

steps in, 554 –59

one-sided hypothesis test, 498, 511,

575–76

about one population proportion,

511–27

exact p-values, 521–23

power and sample size for, 524 –26

p-value for z-test, computing,

516 –21

rejection region approach, 523–24

z-statistic, details for calculating,

515–16

z-test for, 511–14

about population mean of paired differences, 341– 42, 552, 561– 64

for population slope, 609–10

power of hypothesis test, 509–10,

524 –26, 584 – 85

probability question at center of,

499–500

reaching conclusion in, 501– 4

p-value, computing, 502

rejecting null hypothesis, 503

two possible conclusions, 503– 4

statistical significance, 333, 533–36

real importance vs., 6, 536

sample size and, 533–35, 613

steps in, 496, 511, 553

summary of procedures for, 586 – 87

two-sided hypothesis test, 498, 511,

520 –21, 526, 574 –75

in two-way ANOVA, S4-9–20

balanced versus unbalanced design, S4-9–10, S4-11

degrees of freedom, S4-12

estimating main effect and interaction effects, S4-11



Index

examples and interpretation of

results, S4-14 –19

F-tests, 692 –93, S4-12 –13

hypotheses, S4-10

nonsignificant results, cautions

about, S4-19–20

sum of squares, S4-11–12

Hypothesis-testing paradox, 583– 84

Hypothetical hundred thousand,

255–56



I

Independence, chi-square test for, 638

Independence of statistical agencies

from politics, S5-11–12

Independent events, 240 – 41, 243, 249

multiplication rule for probability of,

246 – 47, 251

probability of series of, 252

Independent random variables, linear

combination of, 315–20

independent binomial random variables, 318–20

independent normal random variables, 316 –18

variance and standard deviation of,

316

Independent samples, 342, 354, 362,

365, 446

confidence intervals

for difference between two means,

466 –77

for difference between two proportions, 423–28

defined, 342

hypothesis tests

difference between two means,

342 – 43, 552, 565–74

difference between two proportions, 340, 527–33

methods of generating, 366, 446 – 47

standard error of difference between

two sample means, 449–50

for two-sample t-test, 566

Independent variable, 119. See also Explanatory variable(s)

Inference(s)

choosing appropriate inference procedure, 576 – 80

appropriate parameter, 578–79

confidence interval versus hypothesis testing, 577–78

examples of, 579– 80

about multiple regression models,

S3-9–13

checking conditions for, S3-14 –15

about simple regression. See

Regression

about slope of linear regression,

609–13



statistical, 229, 332 –33, 401. See also

Confidence interval(s); Hypothesis testing

defined, 332, 333

preparing for, 368–73

principal techniques in, 332 –33,

401

sampling distribution and, 347

using data for, 705– 8

cautions concerning, 551

Fundamental Rule for, 72, 89, 125,

137, 404, 414 –15, 706

Inferential statistics, 71–72, 206 –7

Influential observations, 172

Informed consent in medical research,

S5-5– 8

checklist, S5-7– 8

Injury, research-related, S5-6

Institutional Review Board (IRB), S5-5

Interacting variables, 138–39

Interaction effect in two-way ANOVA,

689–90, 692, S4-3, S4-7, S4-8

interpreting, S4-9

testing for, S4-9–20

Interaction plot, 690, 691, 693, S4-8–9,

S4-15, S4-17, S4-19–20

Intercept

in computer results for regression,

171

of least squares line, formula for, 164

of regression line

for population, 603

for sample, 601

of straight line, 159, 160, 161, 601

Interquartile range (IQR), 26, 41– 42, 43

Interval estimate, 405, 408. See also

Confidence interval(s)

Intervals

confidence. See Confidence

interval(s)

for histograms, 30

notation for probability in, 300 –302

prediction, 613–16, 617, 618, S3-13–14

for stem-and-leaf plot, 31

Interviewers, sample survey vs. census,

74

Interviewing system, computer-assisted

telephone (CATI), 89

Inverse, confusion of the, 261– 63

IRB (Institutional Review Board), S5-5



759



p-value for, S2-17–18

test statistic for, S2-17, S2-18



L

Large effect, 582

Law of Large Numbers, 373–74

Law of small numbers, 267

Least squares, 163

Least-squares criterion, 601, S3-5

Least squares line, 163– 64, 601

curvilinear data and, 175

defined, 601

example of, 180

inappropriately combined groups

and, 173

outliers and, 172

sum of squared errors and, 163, 164

Level of significance (a level), 501, 503

probability of type 1 error and, 508,

510

probability of type 2 error and, 509,

510

p-values close to .05 and, 651–52

Linear combination of random variables, 315–20

independent random variables,

315–20

independent binomial random

variable, 318–20

independent normal random variables, 316 –18

variance and standard deviation of,

316

mean of, 315

Linear multiple regression model. See

Multiple regression

Linear relationship(s), 153, 599, 600.

See also Regression

correlation describing, 165–71

described with regression line,

157– 65

on scatterplot, 153–54

Literary Digest poll of 1936, 94 –95

Location of data (dataset), 26, 36

describing, 37–39

picturing, with boxplot, 33–35

Log transformation on the y’s, 623

Lower quartile, 2, 3, 42

Lurking variable, 120 –22, 123



M

J

Joint probability distribution function,

S1-12



K

Kruskal–Wallis Test for comparing medians, 687– 88, S2-16 –19

null and alternative hypotheses for,

S2-16



McCabe, George, 265

Magnitude of statistically significant

effect, 536

Mail surveys, 91, 92

Main effect in two-way analysis of variance, 690, 692, S4-3

for Factors A and B, S4-7, S4-8

interpreting, S4-9

testing for, S4-9–20



760



Index



Mann–Whitney test (Mann–Whitney–

Wilcoxon test), S2-7–12

Margin of error, 75–79, 405, 419–23, 720

conservative, 76, 79, 420, 421–23

defined, 4, 76, 420

desired, sample size for, 79

for 95% confidence, 418, 419–23

general format for, 420

for proportion, 418, 419–23

sample size and desired, 79

Matched-pair design, 130, 131, 132

case-control study compared to, 135

Matched pairs, data collected as, 363

Mean(s), 26

of binomial random variable, 298–99,

S1-6

confidence intervals for, 443–93

estimation situations, 444 – 45

general format for, 408, 444

for one population mean, 452 – 61

for population mean of paired differences, 461– 66

confidence intervals for difference

between two, 466 –77

equal variance assumption of,

473–76

general format for, 467

interpreting, 470 –72

pooled standard error for, 473–76

valid situations for, 468–70

defined for sample, 37

determining for sample, 37–39

effect-size measure for comparing

two, 581

effect-size measure used for single,

580 – 81

estimating mean y at specified x in

regression, 617–19

F-test for comparing, 669–79

for hypergeometric random variable,

S1-6 –7

hypothesis testing about, 551–97

difference between two population

means, 342 – 43, 552, 565–74

one population mean, 551, 553– 61

population mean of paired differences, 341– 42, 552, 561– 64

influence of outliers on, 40

influence of shape on, 40 – 41

of linear combination of random variables, 315

in multiple linear regression model,

S3-3

Normal Curve Approximation Rule

for Sample Means, 358– 62, 375,

449

notation for, 336

for Poisson random variable, S1-9

population. See Population mean(s)

of rank-sum statistic, S2-10



sample. See Sample mean(s)

of sampling distribution, 347, 348

of difference in two sample means,

367– 68

of difference in two sample proportions, 355–57

of sample mean of paired differences, 364 – 65

sampling distribution of the, 359

in simple regression model for population, 604

standard error of the, 361

sample mean, 364 – 65, 448–50, 467

standardized statistic for, 370

of standard normal random variable,

304

of Wilcoxon signed-rank test, S2-14

Mean square error (MSE) in ANOVA, 681

two-way ANOVA, S4-12

Mean square for groups (MS Groups),

680, 682

Mean value. See Expected value

Measurement error, outliers caused by,

47, 48

Measurement variable, 16

Median(s), 2, 26, 42 – 43

defined, 3, 37

determining, 37–39

hypotheses about, 686

testing, 555

influence of outliers on, 40

influence of shape on, 40 – 41

population, comparing, 686 – 89

Kruskal–Wallis Test for, 687– 88,

S2-16 –19

Mood’s Median Test, 688– 89

for representing “central” value, 376

Medical research, S5-4

informed consent in, S5-5– 8

Medium effect, 582

Memory, relying on, 140

Meta-analysis, 583

Midrank, S2-9

Milgram, Stanley, S5-3

Minitab, commands in

“Assume Equal Variances” option,

474, 477

bar graph of categorical variable, 24

binomial probabilities, 297

boxplot, 35

chi-square statistic for goodness-offit, 656

chi-square test for two-way table,

212, 215, 646

confidence intervals

for difference between two independent means, 426

for E(Y ) in regression, 619

for mean, 460

for mean of paired differences, 466

for proportion, 416



correlation, finding, 170

determining percent falling into categories of categorical variable, 21

dotplot, 31, 33

Fisher’s Exact Test (version 14 and

higher), 562

histogram, 33

hypergeometric probabilities, computing, S1-5

interaction plot, drawing, 693

for Kruskal–Wallis Test, S2-19

nonparametric procedures for comparing several medians, 689

normal curve probabilities and percentiles, 311

normal curve superimposed onto

histogram, 50

one-sample t-test, 564

one-way analysis of variance, 679

paired t-test, 564

pie chart of categorical variable, 24

plotting and storing residuals, 623

Poisson probabilities, finding, S1-9

prediction intervals and confidence

intervals, creating, S3-14

prediction intervals for y, 619

quartile estimates, 42

random sample, picking, 82, 83

regression equation, finding, 171

regression line, finding, 162

residuals versus fits, graphing, S3-16

sample regression equation, estimating, S3-7

sample size for specified power or

power for specified sample size,

585

scatterplot, 156

for sign test, S2-6

simulation to estimate probabilities,

260 – 61

stemplot, 33

summary statistics for quantitative

variable, 47

testing hypotheses about proportion,

526

testing significance of relationship

between categorical variables,

211, 216, 646

for two-sample rank-sum test, S2-12

two-sample t procedure, pooled version, 474 –75

two-sample t-test to compare means,

573

two-sample z-test for difference in

proportions, 532

for two-way ANOVA, S4-13

two-way table for two categorical

variables, 21

for Wilcoxon signed-rank test and,

S2-16



Index

Mode, 33

Mood’s Median Test, 688– 89

Mosteller, Fred, 264

MSE (mean square error), 681, S4-12

MS Groups (mean square for groups),

680, 682

Multinomial distribution, S1-11–13

Multinomial experiment, S1-12

Multinomial probability distribution,

S1-12

Multiple coefficient of determination,

S3-8

Multiple comparisons, 676 –79

Multiple regression, 603, S3-2 –19

checking conditions for inference,

S3-14 –15

defined, S3-2

inference procedures in, S3-9–14

multiple linear regression model,

S3-3–9

equation, S3-3– 4

estimating standard deviation,

S3-7

general form of, S3-2

population version, S3-9

proportion of variation explained

by explanatory variables (R 2 ),

S3-8

sample regression equation,

S3-5–7

sample version, S3-9

Multiple squared correlation, S3-8

Multiple-testing phenomenon, 586

selective reporting and, S5-17–18

Multiplication rule for probability,

245– 47, 249, 250 –51

Multiplier in confidence interval,

412 –14, 460 – 61, 472

confidence level and, 408, 412 –14

95%, 412

margin of error in sample proportion

and, 420

t *, 450 –52, 453, 460, 472, 611, 616,

684, S3-10, S3-13

width of confidence interval and, 410

z *, 412 –14, 424, 425, 450, 460 – 61, 472

Multistage sampling plan, 89

Mutually exclusive events, 240, 243

addition rule for probability of,

244 – 45, 249, 250

finding probability of none of collection of, 252

finding probability of one of collection of, 252



N

National Opinion Research Center

(NORC), 91–92

National Research Council, S5-11

Committee on National Statistics,

S5-11



Negative association, 152, 153

Negative correlation, 166, 167, 178

No association, 152

Nonlinear (curvilinear) relationship on

scatterplot, 154 –55

Nonparametric tests of hypotheses, 551,

S2-2 –23

Kruskal–Wallis Test, 687– 88, S2-16 –19

Mood’s Median Test, 688– 89

resistance to outliers, S2-2

robustness of, S2-2

sign test, 555, S2-2, S2-3–7

two-sample rank-sum test, S2-7–12

Wilcoxon signed-rank test, S2-12 –16

Nonresponse bias, 4, 74, 91, 92

Nonsignificant results, interpreting, 215,

S4-19–20

Normal approximation to binomial distribution, 311–14

Normal Curve Approximation Rule for

Sample Means, 358– 62, 375, 449

conditions for, 358–59

defined, 359

examples of scenarios, 359– 60

Normal Curve Approximation Rule for

Sample Proportions, 349–53,

375, 382, 414, 417, 515, 516

Central Limit Theorem and, 375

conditions for, 350

defined, 350

examples of scenarios, 350 –51

Normal curve (normal distribution), 49,

50, 55, 302

rejection region on, 524

sampling distribution as approximately, 347

conditions for, 355, 364, 366 – 67

standard, 304 – 6, 370, 371

z-scores to solve probability problem

about, 305–9

Normal population, standard, 368

Normal probability plot, 621

Normal random variable, 302 –11

cumulative probability for any, 305– 6

defined, 302

independent, linear combinations of,

316 –18

percentile ranking, finding, 310 –11

probability relationships for, 306 –7

standard, 304

standardized scores (z-scores), 54 –55,

303–9

symmetry property of, 307

Norton, P.G., 425

Null hypothesis, 208, 497

assumption as possible truth, 499

defined, 497

examples of, 497

inability to reject, 503– 4

inappropriateness of accepting,

503– 4, 534



761



for Kruskal–Wallis Test, S2-16

for one-sample t-test, 553, 554

for paired t-test, 562

for regression coefficient, S3-9–10

rejection of, 503, 720

based on p-value, 553

rejection region leading to, 523–24

for sign test, S2-3

with discrete random variable, S2-7

for two-sample rank-sum test, S2-8

for two-sample t-test, 565

for two-way table, 208, 636 –38

as statement of homogeneity, 638

as statement of independence, 638

for Wilcoxon signed-rank test, S2-12

for z-test for one proportion, 513

for z-test of differences in two proportions, 528, 530

Null standard error, 501, 515, 528

for difference in two proportions, 515

for one proportion, 529

sample size and, 535

Null value, 333, 498–99, 501, 511, 512,

515, 552

Numerator degrees of freedom, 674,

675, 676, S3-11

Numerical summaries

of categorical variables, 19–20

of quantitative variables, 36 – 47

Numerical variable, 16



O

“Obedience and individual responsibility” experiment, Milgram’s, S5-3

Observation(s)

defined, 14

influential, 172

rank of, S2-9

Observational studies, 118, 123

case-control studies, 134 –35, 136

causal conclusions from, 5, 118,

136 –37, 706 –7, 708

confounding variable, effect of, 204 – 6

defined, 5

designing, 133–36

difficulties and disasters in, 136 – 41

retrospective or prospective studies,

134, 136, 140 – 41

Observational unit, 14

Observed association, interpretations

of, 176 –77

Observed counts, 208

for chi-square test, 638, 640, 647

Odds, 199, 201

Odds ratio, 199–200, 201

Office for Protection from Research

Risks, S5-5

One-sample t-test, 553– 61

conditions for, 554 –55

paired t-test, 561– 64

p-value in, 555–56



762



Index



One-sample t-test (continued )

steps in, 554 –59

t-statistic for, 555, S2-3

One-sided (one-tailed) hypothesis test,

498, 511

confidence intervals and, 575–76

One-way analysis of variance, 669– 85

analysis of variance table, 679– 80,

682

comparing means with F-test, 669–79

assumptions and necessary conditions for, 672 –74

defined, 670

family of F-distributions, 674 –75

multiple comparisons, 676 –79

notation for summary statistics,

672

p-value, 670, 675–76

steps in, 685

comparing two-way ANOVA and,

S4-3– 4

measuring total variation, 681– 82

measuring variation between groups,

680

measuring variation within groups,

680 – 81

model for, S4-4

95% confidence intervals for population means in, 684

Open questions, 101, 102 –3

Ordering of survey questions, 97–98

Ordinal variable, 16, 193

Outcome variable, 21, 119. See also

Response variable(s)

Outliers, 26, 27–28, 29–30, 36, 43,

S5-15

boxplot to identify, 33, 34, 43– 46

defined, 27

handling, 47– 49

impact on correlation and regression

results, 172 –73, 178, 179

influence on mean and median, 40

pictures of, 32, 34

possible reasons for, 47– 48

on residual plot, 621–22

in regression, 156

resistance of nonparametric tests to,

S2-2

on scatterplot, 156, 621–22

sign test used in case of, 555

valid confidence interval estimate of

population mean and, 454



P

Paired data (paired samples), 445– 47,

561

matched-pair design, 130, 131, 132,

135

one-sample t-test for, 561– 64



Paired differences

confidence interval for population

mean of, 461– 66

calculating, 464 – 65

conditions required for using,

462 – 64

interpreting, 464

defined, 341

notation for, 462

population mean for, 341– 42

sample mean of

sampling distribution for, 362 – 65,

383

standard error of, 364 – 65, 448– 49

Paired t-interval, 461– 66

Paired t-test, 561– 64

Parameter(s), 15, 332, 333, 404, 444

Big Five, 335–36

computing confidence intervals

for, 407–10

examples of, 336 – 43

hypothesis tests for, 496 –504

determining appropriate, 578–79

as long-run probability, 415–16

notation for, 336

null value for, 333

sampling distribution of statistic

estimating. See Sampling

distribution

statistic vs., 335, 336

translating curiosity to questions

about, 334 – 43

Parametric test, S2-3

Participants (subjects), 119

ethical treatment of, S5-2 –9, S5-12

animal, S5-8–9

human, S5-2 – 8

in randomized experiments, 125

Past as source of data, using, 140 – 41

Pearson product moment correlation.

See Correlation

Percentage, probabilities and, 258

Percentile, 46, 310

Percentile ranking, 310 –11

Percent increase/decrease in risk, 199

Perfect correlation, 166

Personal bias, S5-12 –13

Personal decision-making, 709–12

Personal probability, 235–36, 237

Pie charts, 22, 24

Pilot survey, 102

Placebo, 5, 6, 128

Placebo effect, 128

Plots of the residuals, 619–23

Point estimate, 404, 405. See also

Statistic

Poisson approximation for binomial

random variables, S1-11

Poisson distribution, S1-7–11

Poisson processes, S1-10



Poisson random variable X, S1-7–11

examples of, S1-7

mean, variance, and standard deviation for, S1-9

possible values for, S1-8

probabilities for, finding, S1-7–9

Policy decisions, using statistical studies

in, 713–15

Political affiliation, influence on survey

responses of, 103

Political pressures on research results,

S5-19

Politics, independence of statistical

agencies from, S5-11–12

Polls. See Sample survey

Pooled confidence interval for the difference between two population

means, 474, 477

Pooled sample variance (s 2p ), 571

Pooled standard deviation (sp), 473, 571

in ANOVA, 681

Pooled standard error, 571, 572

for difference between two means,

473–76

Pooled two-sample t-test, 567, 571–74

Pooled variance, 473

Population

data, 14 –15

defined, 4, 72, 73, 403– 4

extending results inappropriately to,

137–38

mean for, 52, 341– 43. See also Expected value

regression line for, 601, 602 –5

standard deviation for, 52, 292 –93

Population average, stratified sampling

and accuracy of, 85

Population data, 14 –15

Population intercept (b0 ), 603

Population mean(s), 52, 292 –93, 336,

341– 43. See also Expected value

confidence interval for

difference between two population means, 466 –77

in multiple regression model,

S3-13–14

one population mean (t-interval),

452 – 61

paired differences (paired

t-interval), 461– 66

in simple regression, 617, 618

estimating, from sample mean, 341

for Factor A, level i, S4-5

for Factor B, level j, S4-5

overall, in two-way ANOVA, S4-5

for paired differences, 341– 42

testing hypotheses about, 551–97

difference between two population

means, 342 – 43, 552, 565–74

one population mean, 551, 553– 61



Index

population mean of paired differences, 341– 42, 552, 561– 64

Population median(s)

comparing, 687– 89

Kruskal–Wallis Test for, 687– 88,

S2-16 –19

Mood’s Median Test for, 688– 89

hypotheses about, 686

sign test to test hypotheses about

value of, S2-3–7

Population parameter. See Parameter(s)

Population proportion, 338–39, 404

confidence interval for, 77, 409–23

approximate 95%, 420 –21

common settings, 411

computing, 412 –14

conditions required for using formula, 414 –16

conservative 95%, 421

constructing 95%, 417–18

difference between two population

proportions, 423–28

formula for, 409, 414

width of, 409–10

z * multiplier, determining, 412 –14

estimating, from single sample proportion, 339, 353–54

hypothesis testing of, 511–27

difference between two population

proportions, 340, 527–33

exact p-values, 521–23

power and sample size for, 524 –26

p-value for z-test, computing,

516 –21, 529, 531–32

rejection region approach, 523–24

z-statistic, details for calculating,

515–16

z-test for, 511–14

notation for, 336

true, power and, 510, 525

Population size, accuracy of survey

and, 79

Population slope (b1), 603

confidence interval for, 611–12

hypothesis test for, 609–10

Population standard deviation (s), 52,

292 –93, 336

Positive association, 152 –53

Positive correlation, 166, 167, 178

Power of test, 509–10

effect size and, 584 – 85

sample size and, 509–10, 524 –26, 584,

585

Practical significance, 8, 214 –15, 536,

586

Predicted value yˆ , 160, 162 – 63, S3-5,

S3-9, S3-15

Prediction error, 162 – 63

Prediction interval, 613–16, 617, 618,

S3-13–14



Predictions, regression equation to

make, 157

Predictor variable, 600, S3-2

Principles and Practices for a Federal

Statistical Agency (NRC Committee on National Statistics), S5-11

Probability(ies), 228–77

assigning, to simple events, 238–39,

242

basic rules for finding, 243–51

addition rule for “either/or”

(Rule 2), 244 – 45, 250

complement rule for “not the

event” (Rule 1), 243– 44, 250,

S1-8

conditional probability (Rule 4),

247– 49, 251, 252

multiplication rule for “and”

(Rule 3), 245– 47, 249, 250 –51

sampling without replacement,

249–50, 295

sampling with replacement,

249–50

for binomial random variables,

295–98

coherent, 236

complementary events, 239, 240, 243

for complex events based on discrete

random variables, 287– 88

conditional, 241– 42, 243, 246,

247– 49, 251, 252, 256 –58, 501,

502, 507– 8

consequences of two types of errors

and, 507– 8

for continuous random variables, 282

cumulative, 286, 304 – 6, 310 –11,

312 –14

defined, 231, 232

definitions and relationships, 238– 43

dependent events, 240 – 41, 243, 245,

246, 251

for discrete random variables, 282

distribution, 283– 86. See also Probability distribution

flawed intuitive judgments about,

261– 68

coincidences, 264 – 67

confusion of the inverse, 261– 63

gambler’s fallacy, 267

specific people vs. random individuals, 263

independent events, 240 – 41, 243,

246 – 47, 249, 251, 252

interpretations of, 231–37

personal probability, 235–36, 237

relative frequency, 232 –35, 237

mutually exclusive events, 240, 243,

244 – 45, 249, 250, 252

for normal random variables, 306 –7

percentages and, 258



763



philosophical issue about, 236 –37

Poisson process and, S1-10

for Poisson random variables, finding, S1-7–9

proportions and, 258

random circumstances and, 229–31

simulation to estimate, 259– 61

standard normal, 304 – 6

strategies for finding complicated,

251–58

hints and advice, 252

hypothetical hundred thousand,

255–56

steps for, 253–55

tree diagrams, 256 –59

for Student’s t-distribution, 372 –73

subjective, 236

of type 1 error (a), 507, 508, 510

of type 2 error (b), 509, 510

Probability density function, 300 –302,

346

Probability distribution, 346. See

also Distribution; Sampling

distribution

chi-square distribution, 641

of discrete random variables, 283– 86,

287– 88

expected value of variable and, 289

F-distribution, 670, 674 –75, S3-11

hypergeometric, S1-3–5

multinomial, S1-12

Student’s t-distribution, 370 –72

Probability distribution function (pdf ),

283

of binomial random variable, 296

of discrete random variable, 283– 88,

S1-3

graphing, 285– 86

for hypergeometric random variable,

S1-3–5

joint, S1-12

for Poisson random variable, S1-8

Probability sampling plans, 80

Professionalism, S5-17

Proportion(s). See also Population

proportion; Sample proportion(s) (pˆ )

confidence intervals for. See under

Confidence interval(s)

probabilities and, 258

as relative frequency probabilities,

234

sample surveys to estimate, 75–76

Proportion of variation explained by

x (R 2 ), 169, 607–9, S3-8

Prospective studies, 134, 136, 140

Psychology experiments, S5-3– 4

Public Health Service Policy on Humane

Care and Use of Laboratory Animals, S5-9



764



Index



p-value, 210, 215, 521, S2-2

close to 0.05, 651–52

computing, 502, 514, 553

defined, 501, 502, 508

determining

for chi-square test for two-way

tables, 210 –11, 213, 215, 216,

641– 43

exact p-values in tests for population proportions, 521–23

for F-test, 670, 675–76

for one sample t-test for a mean,

555–56

for paired t-test, 563– 64

for two-sample t-test, 566 – 67, 569

for z-test for a proportion, 516 –21

for z-test of difference in two proportions, 529, 531–32

in hypothesis test for population

slope, 610

interpreting, 503, 537

in two-way ANOVA, S4-15–16,

S4-17, S4-18

for Kruskal–Wallis Test, S2-17–18

in multiple regression, S3-11, S3-12

rejection of null hypothesis based on,

553

rejection region approach compared

to, 560 – 61

for sign test, S2-4 –7

statistical significance based on, 514,

518, 520, 556, 567

two possible conclusions of hypothesis test based on, 503– 4

for two-sample rank-sum test,

S2-10 –11

for Wilcoxon signed-rank test,

S2-14 –15



Q

Quantitative data, visual displays exploring features of, 24 –36

describing shape, 26, 33–35, 36

pictures of quantitative data, 29–33

strengths and weaknesses, 35–36

Quantitative variable(s), 16, 17, 337,

444 – 45

boxplot of, 33–35

defined, 16, 17, 151

numerical summaries of, 36 – 47

parameters for, 336

examples of, 337, 341– 43

questions to ask about, 18

raw data from, 16

stratified sampling and, 85

summary features of, 26

visual summaries of, 22 –24

Quantitative variables, relationships

between, 151–91



correlation, 165–71

causation and, 176 –77

misleading, reasons for, 171–76

example of, 151

linear patterns described with regression line, 157– 65

scatterplots, patterns in, 152 –57

Quartiles, 42, 46

finding, 42 – 43

lower, 2, 3, 42

upper, 2, 3, 42

Questions

survey, 95–103

closed, 101–2

open, 101, 102 –3

for variable types, 17–18

Quickie polls, 91

Quota sampling, 110



R

r, 165– 80

r 2, 169–70, 171

R 2, 169, 607–9, S3-8

Random assignment, 5, 6

Random circumstances, 229–31

assigning probabilities to outcomes

of, 231

conditions for valid probabilities for

possible outcomes of, 238

Random-digit dialing, 88– 89, 90

Random digits, table of, 80 – 82

Randomization, 126 –27

by third party, preventing bias with,

S5-12

Randomized block design, 131, 132

Randomized experiment(s), 5– 6, 118,

124

causation and, 5– 6, 118, 124, 176 –77,

706 –7

defined, 6

designing, 124 –33

difference in two population means

in, 366

difficulties and disasters in, 136 – 41

participants in, 125

randomization as key to, 126 –27

Random numbers, 80 – 81

Random sample/sampling, 94, 706

defined, 4

probability sampling plans, 80

simple, 73, 80 – 83

in action, example of, 103– 6

stratified, 84 – 85, 86, 89

using table of random digits, 80 – 82

Random variable(s), 279–329, 345

Bernoulli, 295

binomial. See Binomial random

variable(s)

classes of, 280 – 81. See also Continuous random variables; Discrete

random variables



defined, 280, S1-2

families of, 280, 281– 82

hypergeometric, 295, S1-2 –7

independent, 315–20

linear combination of, 315–20

mean value (expected value) of,

288–91, 298–99, S1-6

normal. See Normal random variable

Poisson, S1-7–11

sums, differences, and combinations

of, 314 –20

uniform, 301

Range, 26, 41– 42, 43, 54

Rank of observation, S2-9

Rank-sum statistic, S2-9–10

Rank-sum test, two-way, S2-7–12

Rank test, 687

Kruskal–Wallis Test, 687– 88, S2-16 –19

Mood’s Median Test, 688– 89

Rate, 3

base, 3

Raw data, 13–15

from categorical variable, 15

notation for, 36

from quantitative variables, 16

Regression, 157–165 , 599– 633

case study of, 623–25

constant variance in, 603

estimating mean y at specified x,

617–19

inference about slope of linear,

609–13

misleading, reasons for, 171–76

curvilinear data, 175–76

inappropriately combined groups,

173–74

large sample size, 613

outliers, 172 –73, 178, 179

models, 600 – 605

inferences made with, checking

conditions for, 619–23

multiple linear regression, S3-3–9

regression line for population, 601,

602 –5

regression line for sample, 601,

601–2, 605

multiple, 603, S3-2 –19

95% prediction interval, 614 –16, 617

simple, 158, 599, S3-2

standard deviation for, 605–9

estimating, 606 –7

proportion of variation explained

by x, 607–9

statistical use of word, 165

Regression analysis, 157

purpose of, 171

Regression coefficients, S3-3

estimating, using sample data, S3-5

testing null and alternative hypotheses for, S3-9–10



Index

Regression equation, 157– 65, 599, S3-2

computer results for, 171

defined, 151, 157

example of, 180

extrapolation from, 171–72

writing, 159

Regression line, 157– 65, 600 – 601

defined, 158

equation for, 160 – 62

interpreting, 160 – 61

least squares line as estimate of,

163– 64

for population, 601, 602 –5

deviations from, 603–5

for sample, 601–2, 605

deviations from, 602

slope of, 601, 603, 609–13

sum of squared error (SSE) and, 170,

606, S3-5, S3-8

Rejection region, 523–24

for chi-square tests, 643– 45

for F-test, 675

p-value approach compared to,

560 – 61

for t-tests, 559– 61, 564, 568, 569–70

for z-tests, 523–24

Relationship between categorical variables, 197

Relative frequency, 20

on histograms, 30

simulation approach for estimating

long-run, 259– 61

Relative frequency distribution, 20

Relative frequency probability, 232 –35,

237

estimating, from observed categorical

data, 234 –35

methods of determining, 233–34

proportions and percentages as, 234

simulation approach for estimating

long-run, 259– 61

Relative risk, 198–99, 201, 203, 204

Repeated-measures designs, 130

Replacement

sampling with, 249–50

sampling without, 249–50

Reporting of research results, fair,

S5-16 –20

Representative sample, 706, 720, S5-19.

See also Sampling

Research studies, 117–24

ethics in. See Ethics

evaluating significance in, 585– 86

fair reporting of results, S5-16 –20

types of, 117–18. See also Experiment(s); Observational studies

variables, roles played by, 119–23

who is measured, 119

Residual(s), 162 – 63, 602

plots, 619–23



in sample regression equation, S3-5,

S3-9

Residual sum of squares, 606

Residuals versus fits, S3-15–16

Resistance of nonparametric tests to

outliers, S2-2

Resistant statistic, 43

Response bias, 75

possible sources of, in surveys, 95–99

Response rate, 91–92

Response variable(s), 18, 119, 152

in case-control studies, 134, 135

for categorical variables, 21

change in explanatory variable

caused by, 177

in contingency table, 194, 195, 196

in regression models, 600

roles in research studies, 119, 120,

121, 122, 123

transforming, for analysis of variance,

689

Restrictive sample, 706

Retrospective studies, 134, 136, 140 – 41

Rights as research subject, S5-6

Risk, 198–204

baseline, 3, 198, 203

control of societal, statistics for, 709,

713–15

control of societal, using statistics for,

709, 713–15

defined, 3, 198

misleading statistics about, 201–3

odds ratio, 199–200, 201

percent increase/decrease in, 199

relative, 198–99, 201, 203, 204

Robust

inference procedures for regression

as, 621

nonparametric methods as, S2-2

Row percentages, 194, 195, 196, 197,

636, 639

Rule for Concluding Cause and Effect,

137, 176, 706 –7

Rule for Sample Means, 358– 62, 375,

449

Rule for Sample Proportions, 349–53,

375, 382, 414, 417, 515, 516

Rule for using data for inference, 72, 89,

125, 137, 404, 706

Rules for finding probabilities, 243–51



S

Sagan, Carl, 236

Sample(s). See also Paired data (paired

samples)

cluster, 85– 87

convenience (haphazard), 93–94, 138

defined, 73, 404

dependent, 363

estimating effect size for one and two,

581



765



extending results inappropriately

from, 137–38

independent. See Independent

samples

large, approximate 95% confidence

intervals for, 460 – 61

possible, 349, 358, 360

random. See Random sample/

sampling

regression line for, 601, 601–2, 605

deviations from, 602

representative, 706, 720, S5-19

restrictive, 706

self-selected, 4, 93

simple random, 73, 80 – 83

in action, example of, 103– 6

stratified, 84 – 85, 86, 89

systematic, 87– 88

volunteer, 4, 93

Sample data, 14 –15

Sample estimate (estimate), 332, 333,

404, 405, 408, 444. See also

Statistic

Sample mean(s), 26, 37, 336, 357– 68.

See also Mean(s)

estimating population mean from,

341

finding pattern in, 379– 81

histogram of, 345– 46

Normal Curve Approximation Rule

for Sample Means, 358– 62, 375,

449

outliers and, 40

sampling distribution for one,

357– 62, 383

difference in two sample means,

365– 68, 383

of paired differences, 362 – 65, 383

standard deviation of, 359

difference in two sample means,

367– 68

of paired differences, 364 – 65

sample size and, 361– 62

standard error of, 361, 448– 49

difference in sample means,

367– 68, 449–50, 467

of paired differences, 364 – 65,

448– 49

standardized z-statistic for, 370

in two-way ANOVA, notation for, S4-6

Sample proportion(s) (pˆ ), 348–54, 382,

404

estimating population proportion

from single, 339, 353–54

95% margin of error for, 76, 418,

419–23

Normal Curve Approximation Rule

for, 349–53, 375, 382, 414, 417,

515, 516

notation for, 336



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

6 Skillbuilder Applet: The Confidence Level in Action

Tải bản đầy đủ ngay(0 tr)

×