6 Skillbuilder Applet: The Confidence Level in Action
Tải bản đầy đủ - 0trang
Index
Codes of ethics, S5-2, S5-3, S5-17
Coefﬁcient(s)
correlation. See Correlation
of determination, multiple, S3-8
regression, S3-3
estimating, using sample data, S3-5
testing null and alternative hypotheses for, S3-9–10
Coherent probabilities, 236
Coincidences, 264 – 67
Column percentages, 194, 195, 196, 197,
636, 639
Complement, 239, 243, 252
rule to ﬁnd probability of, 243– 44,
250, S1-8
Complementary events, 239, 240, 243
Completely randomized design, 131,
132
Complexity of survey questions, 97
Computer, reading results for regression
on, 170 –71
Computer-assisted information collection, S5-10 –11
Computer-assisted self-interviews,
98–99
Computer-assisted telephone interviewing system (CATI), 89
Concepts in survey questions, illdeﬁned, 101
Conclusions, making stronger or weaker
than justiﬁed, S5-19
Conditional percentages, 194, 196
Conditional probability, 241– 42, 243,
246
deﬁned, 242
determining, 247– 49, 251, 252
p-value, 501, 502
tree diagram and, 256 –58
type 1 or type 2 errors, 507– 8
Conﬁdence interval(s), 76 –79, 206 –7,
332 –33, 401–93, S5-16
in action, example of, 480 – 82
approximate 95%, 460 – 61, 472
deﬁned, 460
for difference in population means,
472
for large samples, 460 – 61
for proportion p, 420 –21
case studies, 429–31, 478–79
curiosity and, 401–2
decisions and, 428–29
deﬁned, 332, 401, 403, 405, 443
for difference between two population means, 466 –77
equal variance assumption of,
473–76
general format for, 467
interpreting, 470 –72
pooled, 473– 475, 477
pooled standard error for, 473–76
pooled vs. unpooled, 475–76
unpooled, 466 –73, 477
valid situations for, 468–70
for difference between two population proportions, 423–28
conditions for, 424 –28
decision-making and, 428–29
formula, 424
notation for, 424
estimation situations, examples of
different, 444 – 45
examples of, 401–2
general format for determining, 444,
452
general formula for, 408
hypothesis testing versus, choosing,
577–78
independent samples, 446 – 47
information learned from, 577–78
interpreting, 78–79
multiplier in. See Multiplier in
conﬁdence interval
for one population mean (t-interval),
452 – 66
calculating, 452 –53, 455
conditions required for using,
453–56
interpreting, 457–58
t * multiplier, determining, 450 –52,
453, 460
understanding, 459
valid situations for, 454
for one proportion, 409–23
approximate 95%, 420 –21
common settings, 411
computing, 412 –14
conditions required for using,
414 –16
conservative 95%, 421
developing 95%, 417–18
formula for, 409, 414
intuitive explanation of 95%
conﬁdence, 418
width of, 409–10
z * multiplier, determining, 412 –14
for other levels of conﬁdence, 418–19
for population mean E(Y ), in regression, 617, 618, S3-13–14
for population means, in one-way
analysis of variance, 684
for population mean of paired differences (paired t-interval), 461– 66
for population slope, in regression,
611–12
prediction interval compared to, 614
procedures, summary of, 483
for regression coefﬁcient, S3-10
signiﬁcance testing and, 574 –76
one-sided tests, 575–76
tests with two-sided alternatives,
574 –75
standard errors, 447–50
755
summary of formulas for, 431–32
as supporting analysis for chi-square
test, 645
understanding any, 478
Conﬁdence level, 405, 406 –7, 418–19
in action, example of, 480 – 82
deﬁned, 405, 406, 443
family, 677
interpreting, 406 –7
as interval estimate, 405
margin of error associated with,
for 95% conﬁdence level, 418,
419–23
multiplier and, 408, 412 –14
Conﬁdentiality
in medical research, S5-6
of survey, 98–99
Confounding variable(s), 5, 120 –23,
706 –7
case-control studies and reduction of
potential, 135
causation and, 133–37
deﬁned, 120
interpreting observed associations
and, 177
lurking variables, 120 –22, 123
relationships between categorical
variables and, 204 – 6
retrospective studies and, 140
ruling out potential, 708
Simpson’s Paradox and, 205– 6
Confusion of the inverse, 261– 63
Conservative margin of error, 76, 420,
421–23
Conservative 95% conﬁdence interval
for proportion p, 421
Constant variance in regression, 603
Consumer Reports survey, 91
Contingency table, 194, 635. See also
Two-way (contingency) table
assessing statistical signiﬁcance of,
206 –16, 654
chi-square test for, 208–16, 635– 46
relationship between categorical variables displayed in, 194 –97
Continuity correction, 314
Continuous random variables, 281, 282,
300 –311
deﬁned, 280, 300
normal random variable, 302 –11
cumulative probability for, 305–9
deﬁned, 302
independent, linear combinations
of, 316 –18
percentile ranking, ﬁnding, 310 –11
probability relationships for, 306 –7
standard, 304
standardized score (z-scores),
54 –55, 303–9
symmetry property of, 307
756
Index
Continuous random variables
(continued )
probability density function for,
300 –302, 346
sign test and, S2-3
uniform random variable, 301
Continuous variable, 16
Contributions to chi-square, 645
Control groups, 127–28
Control of societal risks, using statistics
for, 709, 713–15
Convenience sample, 93–94, 138
Correlation, 165– 80
causation vs., 176 –77
deﬁned, 151, 165
direction of relationship and, 165–71,
178–79
examples of, 166 – 69
formula for, 166
interpreting correlation coefﬁcient,
166 –70
misleading, reasons for, 171–76
curvilinear data, 174 –75
extrapolation, 171–72
inappropriately combined groups,
173–74
outliers, 172 –73, 178, 179
multiple squared, S3-8
negative, 166, 167, 178
perfect, 166
positive, 166, 167, 178
squared (r 2 ), 169–70, 171
strength of relationship and, 165–71,
178–79
testing hypotheses about correlation
coefﬁcient, 612
of zero, 166
Correlation coefﬁcient. See Correlation
Critical value. See also Rejection region
deﬁned, 559
for chi-square tests, 643– 45
for F-tests, 675–76
for t-tests, 559– 60, 564, 568, 569–70
Cumulative distribution function (cdf )
of discrete random variables,
286
Cumulative probability(ies), 286,
304 –14
for any normal random variable,
305– 6
for binomial random variable, approximating, 312 –14
percentile rank and, 310 –11
for z-scores, ﬁnding, 304 –5
Curvilinear data, impact on correlation or regression results,
174 –75
Curvilinear patterns in scatterplots,
154 –55
Curvilinear relationships, 154, 600
D
Data, 2, 13–15. See also Paired data
(paired samples); Sampling
assurance of data quality, S5-9–13
experimenter effects and personal
bias and, S5-12 –13
U.S. federal statistical agencies
guidelines, S5-9–12
categorical, estimating probabilities
from observed, 234 –35
deﬁned, 3
ﬁve-number summary of, 2 –3, 25, 26,
43– 46
quantitative, visual displays exploring
features of, 24 –36
types of, 14 –15
population data, 14 –15
raw data, 13–15, 16, 36
sample data, 14 –15
using, for inference, 705– 8
Dataset, 14
bimodal, 33, 34
skewed, 32, 33, 40, 41
unimodal, 33
Data-snooping, S5-15
Dawes, Robyn, 262
Deception in research, S5-3– 4. See also
Ethics
Decision-making
appropriate statistical analyses,
S5-14 –16
conﬁdence intervals to guide, 428–29
personal, 709–12
policy decisions, statistical studies to
guide, 713–15
Degrees of freedom (df ), 212, 370 –71,
372, 450
for chi-square distribution, 641, 642
for chi-square statistic, 641, 643
denominator, 674, 675, 676, S3-11
for F-distribution, 674 –75
for goodness-of-ﬁt test, 653
for multiple regression, S3-11
numerator, 674, 675, 676, S3-11
for one-sample t-statistic, 555, 556
for one-way ANOVA, 674 –75, 685
rejection region approach for t-tests
and, 559, 560, 564
for t-distribution, 450, 451, 453
for two-sample t-statistic, 566 – 67,
569
for two-way ANOVA, S4-12
Welch’s approximation, 468, 567
Deliberate bias, 95–96
Denominator degrees of freedom, 674,
675, 676, S3-11
Density function, probability, 300 –302,
346
Dependent events, 240 – 41, 243
general multiplication rule for probability of, 245, 246, 251
Dependent samples, 363
Dependent variable, 119, 152, 600, S3-2.
See also Response variable(s)
Descriptive statistics, 15, 71
Design, research study
observational study, 133–36
randomized experiment, 124 –33
terminology and examples, 131–33
Desire to please, survey respondents
and, 96 –97
Determination, multiple coefﬁcient of,
S3-8
Deterministic relationship, 151, 160
Deviation(s) from regression line
in multiple linear regression model,
S3-3
in population, 603–5
in sample, 602
Diaconis, Persi, 264
Difference between two population proportions, 340
chi-square test vs. z-test for, 647– 49
conﬁdence interval for, 423–28
hypothesis testing, 340, 527–33
Difference in two population means,
342 – 43
conﬁdence intervals for, 466 –77
hypothesis testing, 342 – 43, 552,
565–74
Difference of random variables, 315
independent normal random variables, 316 –18
Direction of relationship, correlation
and, 165–71, 178–79
Discrete random variables, 280, 282,
283–93, S1-2 –15
binomial random variables, 281,
294 –300, 349
binomial experiments and, 294 –95,
339, 348– 49
cumulative probabilities of, approximating, 312 –14
expected value (mean) of, 298–99,
S1-6
independent, combining, 318–20
Poisson approximation for, S1-11
probabilities for, 295–98
standard deviation of, 298–99
variance of, 299
cumulative distribution function
(cdf ) of, 286
deﬁned, 280, S1-2
expected value for, 288–91
hypergeometric, 295, S1-2 –7
hypotheses for, S2-7
multinomial distribution, S1-11–13
Poisson, S1-7–11
probabilities for complex events
based on, 287– 88
probability distribution of, 283– 88,
S1-3
Index
probability notation for, 283
sign test and, S2-3
standard deviation of, 291–92
variance of, 291–92
Disjoint events, 240, 243
Distribution
binomial, 296, S1-12
normal approximation to, 311–14
chi-square, 641
F, 670, 674 –75, S3-11
frequency, 20
hypergeometric, 650, S1-2 –7
multinomial, S1-11–13
normal, 49, 50, 55, 302
rejection region on, 524
sampling distribution as approximately, 347, 355, 364, 366 – 67
standard, 304 – 6, 370, 371
z-scores to solve probability problem about, 305–9
of a quantitative variable, 26
Poisson, S1-7–11
probability. See Probability
distribution
relative frequency, 20
sampling. See Sampling distribution
t (Student’s t), 370 –73, 450 –52
Dotplot, 2, 25, 29
creating, 31, 33
interpreting, 29–30
strengths and weaknesses, 35–36
Double-blind experiment, 128, 129,
S5-12
Double dummy, 128–29
Dunn, E.V., 425
E
e, value of, S1-8
Ecological validity, 139– 40, S5-19
Effect modiﬁer, 147
Effect size, 580 – 85
comparing, across studies, 583– 84
estimating, for one and two samples,
581
interpreting, 582 – 83
power and, 584 – 85
t-statistic and, 581– 82
Emotions, measuring, 101
Empirical Rule, 52 –57, 279, 303, 448
in action, example of, 56 –57
applied to z-scores, 54 –55
deﬁned, 55
95% conﬁdence interval and, 417–18
range and standard deviation and, 54
Equal variance assumption, 473–76, 567
Error(s), 504 –10
least squares line and, 163
margin of. See Margin of error
mean square (MSE), 681, S4-12
outliers caused by, 47, 48
prediction, 162 – 63
SSE (sum of squared error), 163, 164,
170, 606, 608, 609, 680 – 82, S3-5,
S3-8, S4-11–12
standard. See Standard error(s)
types 1 and 2 in hypothesis testing,
506 –10, 564n, 720
consequences of, in personal
decision-making, 710
and their probabilities, 506 –10
trade-off between, 510
Error term (P), S4-4
Estimated y, 160
Estimate (sample estimate, sample
statistic, point estimate), 332,
333, 404, 405, 408, 444. See also
Statistic
Estimation. See also Conﬁdence
interval(s)
language and notation of, 403– 4
margin of error in, 75–79, 405
for 95% conﬁdence, 418, 419–23
Estimation situations, examples of different, 336 –37, 444 – 45
Ethical Guidelines for Statistical Practice
(American Statistical Association), S5-2, S5-17
Ethical Principles of Psychologists and
Code of Conduct (APA)
on animal research, S5-9
on deception in research, S5-3– 4
on informed consent, S5-4
Ethics, S5-2 –22
in animal research, S5-8–9
appropriate statistical analyses,
S5-14 –16
assurance of data quality, S5-9–13
experimenter effects and personal
bias and, S5-12 –13
U.S. federal statistical agencies
guidelines, S5-9–12
case study, S5-19–20
codes of, S5-2, S5-3, S5-17
fair reporting of results, S5-16 –20
multiple hypothesis tests and
selective reporting, S5-17–18
sample size and statistical signiﬁcance, S5-17
stronger or weaker conclusions
than justiﬁed, S5-19
observational study vs. randomized
experiment, 118
treatment of human participants,
S5-2 – 8, S5-12
example of unethical experiment,
S5-3– 4
informed consent and, S5-4 – 8
Event(s)
complementary, 239, 243
deﬁned, 238, 243
dependent, 240 – 41, 243, 245, 246,
251
757
independent, 240 – 41, 243, 246 – 47,
249, 251, 252
mutually exclusive, 240, 243, 244 – 45,
249, 250, 252
simple, 238, 238–39, 242
Exact p-values in tests for population
proportions, 521–23
Excel commands
for binomial probabilities, 297
for hypergeometric probabilities,
S1-5
for minimum value, 46
normal distribution function, 55
for percentiles, 46
for Poisson probabilities, S1-9
for p-value, 212, 517, 518, 558
of chi-square test, 642, 655
for F-test, 675
for paired t-test, 563
for quartiles, 46
for regression equation, 162
for standard deviation and variance,
52
for t * multiplier, 452
for t-value corresponding to speciﬁed
area, 373
Expected counts, 208–9
for goodness-of-ﬁt test, 653
for two-way tables, 638– 40, 646
interpreting and calculating,
639– 40
Expected value, 289
of binomial random variables,
298–99, S1-6
calculating, 289–91
of discrete random variables, 288–91
binomial random variables,
298–99, S1-6
notation and formula for, 289
for population, 292 –93
Experiment(s), 118, 124. See also Ethics
binomial, 294 –95, 339, 348– 49,
521–23
designing, 124 –33
difﬁculties and disasters in, 136 – 41
double-blind, 128, 129, S5-12
multinomial, S1-12
randomized. See Randomized
experiment(s)
Experimental unit, 119
number in blocks, 133
Experimenter effects, 139, S5-12 –13
Explanatory variable(s), 18, 119, 152
in case-control studies, 134, 135, 136
for categorical variables, 21
change in, caused by response variable, 177
in contingency table, 194, 195, 196
proportion of variation explained by
(R 2 ), 169, 607–9, S3-8
in regression models, 600
758
Index
Explanatory variable(s) (continued )
roles in research studies, 119, 120,
121, 122, 123
in two-way ANOVA of, 689–90, 692
Exponential function, S1-8
Extension of results inappropriately to
population, 137–38
Extrapolation, 171–72
F
Factors in two-way ANOVA, S4-5
Factor A, S4-5, S4-7, S4-8, S4-10
population mean for, level i, S4-5
Factor B, S4-5, S4-7, S4-8, S4-10
population mean for, level j, S4-5
Fair reporting of research results,
S5-16 –20
False negative, 505, 506 –10. See also
type 2 error
False positive, 505, 506 –10. See also
type 1 error
Family conﬁdence level, 677
Family of random variables, 280, 281
Family type 1 error rate, 677
F-distribution, 670, 674 –75
family of, 674 –75
in multiple regression, S3-11
Federal Register Notices, S5-9
Federal Statistical Organizations’ Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Disseminated
Information, S5-10
Financial pressures on research results,
S5-19
Fisher’s Exact Test, 650 –52
Fisher’s multiple comparison procedure, 677–78
Fit, 615
residuals versus ﬁts, S3-15–16
Five-number summary, 2 –3, 25, 26,
42 – 43
outlier detection and, 43– 46
Frequency, 20
relative, 20, 30, 259– 61
Frequency distribution, 20
F-test for comparing means, 669–79,
685
assumptions and necessary conditions for, 672 –74
deﬁned, 670
family of F-distributions, 674 –75
F-statistic, 670, 672, 675–76, 679, 680,
681, S3-10, S4-13
multiple comparisons, 676 –79
notation for summary statistics, 672
p-value, 670, 675–76
steps in, 685
two-way ANOVA and, 692 –93,
S4-12 –13
examples and interpretation of
results, S4-14 –19
Full model for two-way ANOVA, S4-7–9
deﬁned, S4-7
for each observation, S4-7– 8
Fundamental Rule for Using Data for
Inference, 72, 89, 125, 137, 404,
706
conﬁdence interval for population
proportion and, 414 –15
G
Gallup, George, 95
Gallup Organization, 719
Galton, Francis, 165
Gambler’s fallacy, 267
Gehring, William, 267
Generalizability, 139– 40
General Social Survey (GSS), 91–92
General two-way ANOVA model, S4-6 –7
Goodness-of-ﬁt test, chi-square, 652 –56
Groups
inappropriately combining, 173–74
mean square for (MS Groups), 680,
682
H
Haphazard (convenience) sample,
93–94, 138
Hawthorne effect, 139
Histogram, 29
applicability of Empirical Rule and
shape of, 56, 57
creating, 30 –31, 33
interpreting, 29–30
normal curve superimposed on, 50
of residuals, 621
of sample means, 345– 46
strengths and weaknesses, 35, 36
Homogeneity, chi-square test for, 638
Human participants, ethical treatment
of, S5-2 – 8, S5-12
example of unethical experiment,
S5-3– 4
informed consent and, S5-4 – 8
Hypergeometric distribution, 650,
S1-2 –7
Hypergeometric probability distribution, S1-3–5
Hypergeometric random variable, 295,
S1-2 –7
deﬁned, S1-2
mean, variance, and standard deviation for, S1-6 –7
probability distribution function for,
S1-3–5
survey and poll results as, S1-3
Hypothesis. See Alternative hypothesis;
Null hypothesis
Hypothesis testing, 206, 333, 495–597.
See also Signiﬁcance testing
conﬁdence interval versus, choosing,
577–78
about correlation coefﬁcient, 612
about difference in two population
means, 342 – 43, 552, 565–74
about difference in two population
proportions, 340, 527–33
p-value for, 529, 531–32
errors in, types of, 504 –10, 564n, 710,
720
formulating hypothesis statements,
496 –99
information learned from, 577–78
logic of, 495, 499–500
multiple, 586
nonparametric tests. See Nonparametric tests of hypotheses
about one categorical variable,
652 –56
chi-square goodness-of-ﬁt test,
652 –56
about one population mean, 551,
553– 61
paired data, 341– 42, 552, 561– 64
steps in, 554 –59
one-sided hypothesis test, 498, 511,
575–76
about one population proportion,
511–27
exact p-values, 521–23
power and sample size for, 524 –26
p-value for z-test, computing,
516 –21
rejection region approach, 523–24
z-statistic, details for calculating,
515–16
z-test for, 511–14
about population mean of paired differences, 341– 42, 552, 561– 64
for population slope, 609–10
power of hypothesis test, 509–10,
524 –26, 584 – 85
probability question at center of,
499–500
reaching conclusion in, 501– 4
p-value, computing, 502
rejecting null hypothesis, 503
two possible conclusions, 503– 4
statistical signiﬁcance, 333, 533–36
real importance vs., 6, 536
sample size and, 533–35, 613
steps in, 496, 511, 553
summary of procedures for, 586 – 87
two-sided hypothesis test, 498, 511,
520 –21, 526, 574 –75
in two-way ANOVA, S4-9–20
balanced versus unbalanced design, S4-9–10, S4-11
degrees of freedom, S4-12
estimating main effect and interaction effects, S4-11
Index
examples and interpretation of
results, S4-14 –19
F-tests, 692 –93, S4-12 –13
hypotheses, S4-10
nonsigniﬁcant results, cautions
about, S4-19–20
sum of squares, S4-11–12
Hypothesis-testing paradox, 583– 84
Hypothetical hundred thousand,
255–56
I
Independence, chi-square test for, 638
Independence of statistical agencies
from politics, S5-11–12
Independent events, 240 – 41, 243, 249
multiplication rule for probability of,
246 – 47, 251
probability of series of, 252
Independent random variables, linear
combination of, 315–20
independent binomial random variables, 318–20
independent normal random variables, 316 –18
variance and standard deviation of,
316
Independent samples, 342, 354, 362,
365, 446
conﬁdence intervals
for difference between two means,
466 –77
for difference between two proportions, 423–28
deﬁned, 342
hypothesis tests
difference between two means,
342 – 43, 552, 565–74
difference between two proportions, 340, 527–33
methods of generating, 366, 446 – 47
standard error of difference between
two sample means, 449–50
for two-sample t-test, 566
Independent variable, 119. See also Explanatory variable(s)
Inference(s)
choosing appropriate inference procedure, 576 – 80
appropriate parameter, 578–79
conﬁdence interval versus hypothesis testing, 577–78
examples of, 579– 80
about multiple regression models,
S3-9–13
checking conditions for, S3-14 –15
about simple regression. See
Regression
about slope of linear regression,
609–13
statistical, 229, 332 –33, 401. See also
Conﬁdence interval(s); Hypothesis testing
deﬁned, 332, 333
preparing for, 368–73
principal techniques in, 332 –33,
401
sampling distribution and, 347
using data for, 705– 8
cautions concerning, 551
Fundamental Rule for, 72, 89, 125,
137, 404, 414 –15, 706
Inferential statistics, 71–72, 206 –7
Inﬂuential observations, 172
Informed consent in medical research,
S5-5– 8
checklist, S5-7– 8
Injury, research-related, S5-6
Institutional Review Board (IRB), S5-5
Interacting variables, 138–39
Interaction effect in two-way ANOVA,
689–90, 692, S4-3, S4-7, S4-8
interpreting, S4-9
testing for, S4-9–20
Interaction plot, 690, 691, 693, S4-8–9,
S4-15, S4-17, S4-19–20
Intercept
in computer results for regression,
171
of least squares line, formula for, 164
of regression line
for population, 603
for sample, 601
of straight line, 159, 160, 161, 601
Interquartile range (IQR), 26, 41– 42, 43
Interval estimate, 405, 408. See also
Conﬁdence interval(s)
Intervals
conﬁdence. See Conﬁdence
interval(s)
for histograms, 30
notation for probability in, 300 –302
prediction, 613–16, 617, 618, S3-13–14
for stem-and-leaf plot, 31
Interviewers, sample survey vs. census,
74
Interviewing system, computer-assisted
telephone (CATI), 89
Inverse, confusion of the, 261– 63
IRB (Institutional Review Board), S5-5
759
p-value for, S2-17–18
test statistic for, S2-17, S2-18
L
Large effect, 582
Law of Large Numbers, 373–74
Law of small numbers, 267
Least squares, 163
Least-squares criterion, 601, S3-5
Least squares line, 163– 64, 601
curvilinear data and, 175
deﬁned, 601
example of, 180
inappropriately combined groups
and, 173
outliers and, 172
sum of squared errors and, 163, 164
Level of signiﬁcance (a level), 501, 503
probability of type 1 error and, 508,
510
probability of type 2 error and, 509,
510
p-values close to .05 and, 651–52
Linear combination of random variables, 315–20
independent random variables,
315–20
independent binomial random
variable, 318–20
independent normal random variables, 316 –18
variance and standard deviation of,
316
mean of, 315
Linear multiple regression model. See
Multiple regression
Linear relationship(s), 153, 599, 600.
See also Regression
correlation describing, 165–71
described with regression line,
157– 65
on scatterplot, 153–54
Literary Digest poll of 1936, 94 –95
Location of data (dataset), 26, 36
describing, 37–39
picturing, with boxplot, 33–35
Log transformation on the y’s, 623
Lower quartile, 2, 3, 42
Lurking variable, 120 –22, 123
M
J
Joint probability distribution function,
S1-12
K
Kruskal–Wallis Test for comparing medians, 687– 88, S2-16 –19
null and alternative hypotheses for,
S2-16
McCabe, George, 265
Magnitude of statistically signiﬁcant
effect, 536
Mail surveys, 91, 92
Main effect in two-way analysis of variance, 690, 692, S4-3
for Factors A and B, S4-7, S4-8
interpreting, S4-9
testing for, S4-9–20
760
Index
Mann–Whitney test (Mann–Whitney–
Wilcoxon test), S2-7–12
Margin of error, 75–79, 405, 419–23, 720
conservative, 76, 79, 420, 421–23
deﬁned, 4, 76, 420
desired, sample size for, 79
for 95% conﬁdence, 418, 419–23
general format for, 420
for proportion, 418, 419–23
sample size and desired, 79
Matched-pair design, 130, 131, 132
case-control study compared to, 135
Matched pairs, data collected as, 363
Mean(s), 26
of binomial random variable, 298–99,
S1-6
conﬁdence intervals for, 443–93
estimation situations, 444 – 45
general format for, 408, 444
for one population mean, 452 – 61
for population mean of paired differences, 461– 66
conﬁdence intervals for difference
between two, 466 –77
equal variance assumption of,
473–76
general format for, 467
interpreting, 470 –72
pooled standard error for, 473–76
valid situations for, 468–70
deﬁned for sample, 37
determining for sample, 37–39
effect-size measure for comparing
two, 581
effect-size measure used for single,
580 – 81
estimating mean y at speciﬁed x in
regression, 617–19
F-test for comparing, 669–79
for hypergeometric random variable,
S1-6 –7
hypothesis testing about, 551–97
difference between two population
means, 342 – 43, 552, 565–74
one population mean, 551, 553– 61
population mean of paired differences, 341– 42, 552, 561– 64
inﬂuence of outliers on, 40
inﬂuence of shape on, 40 – 41
of linear combination of random variables, 315
in multiple linear regression model,
S3-3
Normal Curve Approximation Rule
for Sample Means, 358– 62, 375,
449
notation for, 336
for Poisson random variable, S1-9
population. See Population mean(s)
of rank-sum statistic, S2-10
sample. See Sample mean(s)
of sampling distribution, 347, 348
of difference in two sample means,
367– 68
of difference in two sample proportions, 355–57
of sample mean of paired differences, 364 – 65
sampling distribution of the, 359
in simple regression model for population, 604
standard error of the, 361
sample mean, 364 – 65, 448–50, 467
standardized statistic for, 370
of standard normal random variable,
304
of Wilcoxon signed-rank test, S2-14
Mean square error (MSE) in ANOVA, 681
two-way ANOVA, S4-12
Mean square for groups (MS Groups),
680, 682
Mean value. See Expected value
Measurement error, outliers caused by,
47, 48
Measurement variable, 16
Median(s), 2, 26, 42 – 43
deﬁned, 3, 37
determining, 37–39
hypotheses about, 686
testing, 555
inﬂuence of outliers on, 40
inﬂuence of shape on, 40 – 41
population, comparing, 686 – 89
Kruskal–Wallis Test for, 687– 88,
S2-16 –19
Mood’s Median Test, 688– 89
for representing “central” value, 376
Medical research, S5-4
informed consent in, S5-5– 8
Medium effect, 582
Memory, relying on, 140
Meta-analysis, 583
Midrank, S2-9
Milgram, Stanley, S5-3
Minitab, commands in
“Assume Equal Variances” option,
474, 477
bar graph of categorical variable, 24
binomial probabilities, 297
boxplot, 35
chi-square statistic for goodness-ofﬁt, 656
chi-square test for two-way table,
212, 215, 646
conﬁdence intervals
for difference between two independent means, 426
for E(Y ) in regression, 619
for mean, 460
for mean of paired differences, 466
for proportion, 416
correlation, ﬁnding, 170
determining percent falling into categories of categorical variable, 21
dotplot, 31, 33
Fisher’s Exact Test (version 14 and
higher), 562
histogram, 33
hypergeometric probabilities, computing, S1-5
interaction plot, drawing, 693
for Kruskal–Wallis Test, S2-19
nonparametric procedures for comparing several medians, 689
normal curve probabilities and percentiles, 311
normal curve superimposed onto
histogram, 50
one-sample t-test, 564
one-way analysis of variance, 679
paired t-test, 564
pie chart of categorical variable, 24
plotting and storing residuals, 623
Poisson probabilities, ﬁnding, S1-9
prediction intervals and conﬁdence
intervals, creating, S3-14
prediction intervals for y, 619
quartile estimates, 42
random sample, picking, 82, 83
regression equation, ﬁnding, 171
regression line, ﬁnding, 162
residuals versus ﬁts, graphing, S3-16
sample regression equation, estimating, S3-7
sample size for speciﬁed power or
power for speciﬁed sample size,
585
scatterplot, 156
for sign test, S2-6
simulation to estimate probabilities,
260 – 61
stemplot, 33
summary statistics for quantitative
variable, 47
testing hypotheses about proportion,
526
testing signiﬁcance of relationship
between categorical variables,
211, 216, 646
for two-sample rank-sum test, S2-12
two-sample t procedure, pooled version, 474 –75
two-sample t-test to compare means,
573
two-sample z-test for difference in
proportions, 532
for two-way ANOVA, S4-13
two-way table for two categorical
variables, 21
for Wilcoxon signed-rank test and,
S2-16
Index
Mode, 33
Mood’s Median Test, 688– 89
Mosteller, Fred, 264
MSE (mean square error), 681, S4-12
MS Groups (mean square for groups),
680, 682
Multinomial distribution, S1-11–13
Multinomial experiment, S1-12
Multinomial probability distribution,
S1-12
Multiple coefﬁcient of determination,
S3-8
Multiple comparisons, 676 –79
Multiple regression, 603, S3-2 –19
checking conditions for inference,
S3-14 –15
deﬁned, S3-2
inference procedures in, S3-9–14
multiple linear regression model,
S3-3–9
equation, S3-3– 4
estimating standard deviation,
S3-7
general form of, S3-2
population version, S3-9
proportion of variation explained
by explanatory variables (R 2 ),
S3-8
sample regression equation,
S3-5–7
sample version, S3-9
Multiple squared correlation, S3-8
Multiple-testing phenomenon, 586
selective reporting and, S5-17–18
Multiplication rule for probability,
245– 47, 249, 250 –51
Multiplier in conﬁdence interval,
412 –14, 460 – 61, 472
conﬁdence level and, 408, 412 –14
95%, 412
margin of error in sample proportion
and, 420
t *, 450 –52, 453, 460, 472, 611, 616,
684, S3-10, S3-13
width of conﬁdence interval and, 410
z *, 412 –14, 424, 425, 450, 460 – 61, 472
Multistage sampling plan, 89
Mutually exclusive events, 240, 243
addition rule for probability of,
244 – 45, 249, 250
ﬁnding probability of none of collection of, 252
ﬁnding probability of one of collection of, 252
N
National Opinion Research Center
(NORC), 91–92
National Research Council, S5-11
Committee on National Statistics,
S5-11
Negative association, 152, 153
Negative correlation, 166, 167, 178
No association, 152
Nonlinear (curvilinear) relationship on
scatterplot, 154 –55
Nonparametric tests of hypotheses, 551,
S2-2 –23
Kruskal–Wallis Test, 687– 88, S2-16 –19
Mood’s Median Test, 688– 89
resistance to outliers, S2-2
robustness of, S2-2
sign test, 555, S2-2, S2-3–7
two-sample rank-sum test, S2-7–12
Wilcoxon signed-rank test, S2-12 –16
Nonresponse bias, 4, 74, 91, 92
Nonsigniﬁcant results, interpreting, 215,
S4-19–20
Normal approximation to binomial distribution, 311–14
Normal Curve Approximation Rule for
Sample Means, 358– 62, 375, 449
conditions for, 358–59
deﬁned, 359
examples of scenarios, 359– 60
Normal Curve Approximation Rule for
Sample Proportions, 349–53,
375, 382, 414, 417, 515, 516
Central Limit Theorem and, 375
conditions for, 350
deﬁned, 350
examples of scenarios, 350 –51
Normal curve (normal distribution), 49,
50, 55, 302
rejection region on, 524
sampling distribution as approximately, 347
conditions for, 355, 364, 366 – 67
standard, 304 – 6, 370, 371
z-scores to solve probability problem
about, 305–9
Normal population, standard, 368
Normal probability plot, 621
Normal random variable, 302 –11
cumulative probability for any, 305– 6
deﬁned, 302
independent, linear combinations of,
316 –18
percentile ranking, ﬁnding, 310 –11
probability relationships for, 306 –7
standard, 304
standardized scores (z-scores), 54 –55,
303–9
symmetry property of, 307
Norton, P.G., 425
Null hypothesis, 208, 497
assumption as possible truth, 499
deﬁned, 497
examples of, 497
inability to reject, 503– 4
inappropriateness of accepting,
503– 4, 534
761
for Kruskal–Wallis Test, S2-16
for one-sample t-test, 553, 554
for paired t-test, 562
for regression coefﬁcient, S3-9–10
rejection of, 503, 720
based on p-value, 553
rejection region leading to, 523–24
for sign test, S2-3
with discrete random variable, S2-7
for two-sample rank-sum test, S2-8
for two-sample t-test, 565
for two-way table, 208, 636 –38
as statement of homogeneity, 638
as statement of independence, 638
for Wilcoxon signed-rank test, S2-12
for z-test for one proportion, 513
for z-test of differences in two proportions, 528, 530
Null standard error, 501, 515, 528
for difference in two proportions, 515
for one proportion, 529
sample size and, 535
Null value, 333, 498–99, 501, 511, 512,
515, 552
Numerator degrees of freedom, 674,
675, 676, S3-11
Numerical summaries
of categorical variables, 19–20
of quantitative variables, 36 – 47
Numerical variable, 16
O
“Obedience and individual responsibility” experiment, Milgram’s, S5-3
Observation(s)
deﬁned, 14
inﬂuential, 172
rank of, S2-9
Observational studies, 118, 123
case-control studies, 134 –35, 136
causal conclusions from, 5, 118,
136 –37, 706 –7, 708
confounding variable, effect of, 204 – 6
deﬁned, 5
designing, 133–36
difﬁculties and disasters in, 136 – 41
retrospective or prospective studies,
134, 136, 140 – 41
Observational unit, 14
Observed association, interpretations
of, 176 –77
Observed counts, 208
for chi-square test, 638, 640, 647
Odds, 199, 201
Odds ratio, 199–200, 201
Ofﬁce for Protection from Research
Risks, S5-5
One-sample t-test, 553– 61
conditions for, 554 –55
paired t-test, 561– 64
p-value in, 555–56
762
Index
One-sample t-test (continued )
steps in, 554 –59
t-statistic for, 555, S2-3
One-sided (one-tailed) hypothesis test,
498, 511
conﬁdence intervals and, 575–76
One-way analysis of variance, 669– 85
analysis of variance table, 679– 80,
682
comparing means with F-test, 669–79
assumptions and necessary conditions for, 672 –74
deﬁned, 670
family of F-distributions, 674 –75
multiple comparisons, 676 –79
notation for summary statistics,
672
p-value, 670, 675–76
steps in, 685
comparing two-way ANOVA and,
S4-3– 4
measuring total variation, 681– 82
measuring variation between groups,
680
measuring variation within groups,
680 – 81
model for, S4-4
95% conﬁdence intervals for population means in, 684
Open questions, 101, 102 –3
Ordering of survey questions, 97–98
Ordinal variable, 16, 193
Outcome variable, 21, 119. See also
Response variable(s)
Outliers, 26, 27–28, 29–30, 36, 43,
S5-15
boxplot to identify, 33, 34, 43– 46
deﬁned, 27
handling, 47– 49
impact on correlation and regression
results, 172 –73, 178, 179
inﬂuence on mean and median, 40
pictures of, 32, 34
possible reasons for, 47– 48
on residual plot, 621–22
in regression, 156
resistance of nonparametric tests to,
S2-2
on scatterplot, 156, 621–22
sign test used in case of, 555
valid conﬁdence interval estimate of
population mean and, 454
P
Paired data (paired samples), 445– 47,
561
matched-pair design, 130, 131, 132,
135
one-sample t-test for, 561– 64
Paired differences
conﬁdence interval for population
mean of, 461– 66
calculating, 464 – 65
conditions required for using,
462 – 64
interpreting, 464
deﬁned, 341
notation for, 462
population mean for, 341– 42
sample mean of
sampling distribution for, 362 – 65,
383
standard error of, 364 – 65, 448– 49
Paired t-interval, 461– 66
Paired t-test, 561– 64
Parameter(s), 15, 332, 333, 404, 444
Big Five, 335–36
computing conﬁdence intervals
for, 407–10
examples of, 336 – 43
hypothesis tests for, 496 –504
determining appropriate, 578–79
as long-run probability, 415–16
notation for, 336
null value for, 333
sampling distribution of statistic
estimating. See Sampling
distribution
statistic vs., 335, 336
translating curiosity to questions
about, 334 – 43
Parametric test, S2-3
Participants (subjects), 119
ethical treatment of, S5-2 –9, S5-12
animal, S5-8–9
human, S5-2 – 8
in randomized experiments, 125
Past as source of data, using, 140 – 41
Pearson product moment correlation.
See Correlation
Percentage, probabilities and, 258
Percentile, 46, 310
Percentile ranking, 310 –11
Percent increase/decrease in risk, 199
Perfect correlation, 166
Personal bias, S5-12 –13
Personal decision-making, 709–12
Personal probability, 235–36, 237
Pie charts, 22, 24
Pilot survey, 102
Placebo, 5, 6, 128
Placebo effect, 128
Plots of the residuals, 619–23
Point estimate, 404, 405. See also
Statistic
Poisson approximation for binomial
random variables, S1-11
Poisson distribution, S1-7–11
Poisson processes, S1-10
Poisson random variable X, S1-7–11
examples of, S1-7
mean, variance, and standard deviation for, S1-9
possible values for, S1-8
probabilities for, ﬁnding, S1-7–9
Policy decisions, using statistical studies
in, 713–15
Political afﬁliation, inﬂuence on survey
responses of, 103
Political pressures on research results,
S5-19
Politics, independence of statistical
agencies from, S5-11–12
Polls. See Sample survey
Pooled conﬁdence interval for the difference between two population
means, 474, 477
Pooled sample variance (s 2p ), 571
Pooled standard deviation (sp), 473, 571
in ANOVA, 681
Pooled standard error, 571, 572
for difference between two means,
473–76
Pooled two-sample t-test, 567, 571–74
Pooled variance, 473
Population
data, 14 –15
deﬁned, 4, 72, 73, 403– 4
extending results inappropriately to,
137–38
mean for, 52, 341– 43. See also Expected value
regression line for, 601, 602 –5
standard deviation for, 52, 292 –93
Population average, stratiﬁed sampling
and accuracy of, 85
Population data, 14 –15
Population intercept (b0 ), 603
Population mean(s), 52, 292 –93, 336,
341– 43. See also Expected value
conﬁdence interval for
difference between two population means, 466 –77
in multiple regression model,
S3-13–14
one population mean (t-interval),
452 – 61
paired differences (paired
t-interval), 461– 66
in simple regression, 617, 618
estimating, from sample mean, 341
for Factor A, level i, S4-5
for Factor B, level j, S4-5
overall, in two-way ANOVA, S4-5
for paired differences, 341– 42
testing hypotheses about, 551–97
difference between two population
means, 342 – 43, 552, 565–74
one population mean, 551, 553– 61
Index
population mean of paired differences, 341– 42, 552, 561– 64
Population median(s)
comparing, 687– 89
Kruskal–Wallis Test for, 687– 88,
S2-16 –19
Mood’s Median Test for, 688– 89
hypotheses about, 686
sign test to test hypotheses about
value of, S2-3–7
Population parameter. See Parameter(s)
Population proportion, 338–39, 404
conﬁdence interval for, 77, 409–23
approximate 95%, 420 –21
common settings, 411
computing, 412 –14
conditions required for using formula, 414 –16
conservative 95%, 421
constructing 95%, 417–18
difference between two population
proportions, 423–28
formula for, 409, 414
width of, 409–10
z * multiplier, determining, 412 –14
estimating, from single sample proportion, 339, 353–54
hypothesis testing of, 511–27
difference between two population
proportions, 340, 527–33
exact p-values, 521–23
power and sample size for, 524 –26
p-value for z-test, computing,
516 –21, 529, 531–32
rejection region approach, 523–24
z-statistic, details for calculating,
515–16
z-test for, 511–14
notation for, 336
true, power and, 510, 525
Population size, accuracy of survey
and, 79
Population slope (b1), 603
conﬁdence interval for, 611–12
hypothesis test for, 609–10
Population standard deviation (s), 52,
292 –93, 336
Positive association, 152 –53
Positive correlation, 166, 167, 178
Power of test, 509–10
effect size and, 584 – 85
sample size and, 509–10, 524 –26, 584,
585
Practical signiﬁcance, 8, 214 –15, 536,
586
Predicted value yˆ , 160, 162 – 63, S3-5,
S3-9, S3-15
Prediction error, 162 – 63
Prediction interval, 613–16, 617, 618,
S3-13–14
Predictions, regression equation to
make, 157
Predictor variable, 600, S3-2
Principles and Practices for a Federal
Statistical Agency (NRC Committee on National Statistics), S5-11
Probability(ies), 228–77
assigning, to simple events, 238–39,
242
basic rules for ﬁnding, 243–51
addition rule for “either/or”
(Rule 2), 244 – 45, 250
complement rule for “not the
event” (Rule 1), 243– 44, 250,
S1-8
conditional probability (Rule 4),
247– 49, 251, 252
multiplication rule for “and”
(Rule 3), 245– 47, 249, 250 –51
sampling without replacement,
249–50, 295
sampling with replacement,
249–50
for binomial random variables,
295–98
coherent, 236
complementary events, 239, 240, 243
for complex events based on discrete
random variables, 287– 88
conditional, 241– 42, 243, 246,
247– 49, 251, 252, 256 –58, 501,
502, 507– 8
consequences of two types of errors
and, 507– 8
for continuous random variables, 282
cumulative, 286, 304 – 6, 310 –11,
312 –14
deﬁned, 231, 232
deﬁnitions and relationships, 238– 43
dependent events, 240 – 41, 243, 245,
246, 251
for discrete random variables, 282
distribution, 283– 86. See also Probability distribution
ﬂawed intuitive judgments about,
261– 68
coincidences, 264 – 67
confusion of the inverse, 261– 63
gambler’s fallacy, 267
speciﬁc people vs. random individuals, 263
independent events, 240 – 41, 243,
246 – 47, 249, 251, 252
interpretations of, 231–37
personal probability, 235–36, 237
relative frequency, 232 –35, 237
mutually exclusive events, 240, 243,
244 – 45, 249, 250, 252
for normal random variables, 306 –7
percentages and, 258
763
philosophical issue about, 236 –37
Poisson process and, S1-10
for Poisson random variables, ﬁnding, S1-7–9
proportions and, 258
random circumstances and, 229–31
simulation to estimate, 259– 61
standard normal, 304 – 6
strategies for ﬁnding complicated,
251–58
hints and advice, 252
hypothetical hundred thousand,
255–56
steps for, 253–55
tree diagrams, 256 –59
for Student’s t-distribution, 372 –73
subjective, 236
of type 1 error (a), 507, 508, 510
of type 2 error (b), 509, 510
Probability density function, 300 –302,
346
Probability distribution, 346. See
also Distribution; Sampling
distribution
chi-square distribution, 641
of discrete random variables, 283– 86,
287– 88
expected value of variable and, 289
F-distribution, 670, 674 –75, S3-11
hypergeometric, S1-3–5
multinomial, S1-12
Student’s t-distribution, 370 –72
Probability distribution function (pdf ),
283
of binomial random variable, 296
of discrete random variable, 283– 88,
S1-3
graphing, 285– 86
for hypergeometric random variable,
S1-3–5
joint, S1-12
for Poisson random variable, S1-8
Probability sampling plans, 80
Professionalism, S5-17
Proportion(s). See also Population
proportion; Sample proportion(s) (pˆ )
conﬁdence intervals for. See under
Conﬁdence interval(s)
probabilities and, 258
as relative frequency probabilities,
234
sample surveys to estimate, 75–76
Proportion of variation explained by
x (R 2 ), 169, 607–9, S3-8
Prospective studies, 134, 136, 140
Psychology experiments, S5-3– 4
Public Health Service Policy on Humane
Care and Use of Laboratory Animals, S5-9
764
Index
p-value, 210, 215, 521, S2-2
close to 0.05, 651–52
computing, 502, 514, 553
deﬁned, 501, 502, 508
determining
for chi-square test for two-way
tables, 210 –11, 213, 215, 216,
641– 43
exact p-values in tests for population proportions, 521–23
for F-test, 670, 675–76
for one sample t-test for a mean,
555–56
for paired t-test, 563– 64
for two-sample t-test, 566 – 67, 569
for z-test for a proportion, 516 –21
for z-test of difference in two proportions, 529, 531–32
in hypothesis test for population
slope, 610
interpreting, 503, 537
in two-way ANOVA, S4-15–16,
S4-17, S4-18
for Kruskal–Wallis Test, S2-17–18
in multiple regression, S3-11, S3-12
rejection of null hypothesis based on,
553
rejection region approach compared
to, 560 – 61
for sign test, S2-4 –7
statistical signiﬁcance based on, 514,
518, 520, 556, 567
two possible conclusions of hypothesis test based on, 503– 4
for two-sample rank-sum test,
S2-10 –11
for Wilcoxon signed-rank test,
S2-14 –15
Q
Quantitative data, visual displays exploring features of, 24 –36
describing shape, 26, 33–35, 36
pictures of quantitative data, 29–33
strengths and weaknesses, 35–36
Quantitative variable(s), 16, 17, 337,
444 – 45
boxplot of, 33–35
deﬁned, 16, 17, 151
numerical summaries of, 36 – 47
parameters for, 336
examples of, 337, 341– 43
questions to ask about, 18
raw data from, 16
stratiﬁed sampling and, 85
summary features of, 26
visual summaries of, 22 –24
Quantitative variables, relationships
between, 151–91
correlation, 165–71
causation and, 176 –77
misleading, reasons for, 171–76
example of, 151
linear patterns described with regression line, 157– 65
scatterplots, patterns in, 152 –57
Quartiles, 42, 46
ﬁnding, 42 – 43
lower, 2, 3, 42
upper, 2, 3, 42
Questions
survey, 95–103
closed, 101–2
open, 101, 102 –3
for variable types, 17–18
Quickie polls, 91
Quota sampling, 110
R
r, 165– 80
r 2, 169–70, 171
R 2, 169, 607–9, S3-8
Random assignment, 5, 6
Random circumstances, 229–31
assigning probabilities to outcomes
of, 231
conditions for valid probabilities for
possible outcomes of, 238
Random-digit dialing, 88– 89, 90
Random digits, table of, 80 – 82
Randomization, 126 –27
by third party, preventing bias with,
S5-12
Randomized block design, 131, 132
Randomized experiment(s), 5– 6, 118,
124
causation and, 5– 6, 118, 124, 176 –77,
706 –7
deﬁned, 6
designing, 124 –33
difference in two population means
in, 366
difﬁculties and disasters in, 136 – 41
participants in, 125
randomization as key to, 126 –27
Random numbers, 80 – 81
Random sample/sampling, 94, 706
deﬁned, 4
probability sampling plans, 80
simple, 73, 80 – 83
in action, example of, 103– 6
stratiﬁed, 84 – 85, 86, 89
using table of random digits, 80 – 82
Random variable(s), 279–329, 345
Bernoulli, 295
binomial. See Binomial random
variable(s)
classes of, 280 – 81. See also Continuous random variables; Discrete
random variables
deﬁned, 280, S1-2
families of, 280, 281– 82
hypergeometric, 295, S1-2 –7
independent, 315–20
linear combination of, 315–20
mean value (expected value) of,
288–91, 298–99, S1-6
normal. See Normal random variable
Poisson, S1-7–11
sums, differences, and combinations
of, 314 –20
uniform, 301
Range, 26, 41– 42, 43, 54
Rank of observation, S2-9
Rank-sum statistic, S2-9–10
Rank-sum test, two-way, S2-7–12
Rank test, 687
Kruskal–Wallis Test, 687– 88, S2-16 –19
Mood’s Median Test, 688– 89
Rate, 3
base, 3
Raw data, 13–15
from categorical variable, 15
notation for, 36
from quantitative variables, 16
Regression, 157–165 , 599– 633
case study of, 623–25
constant variance in, 603
estimating mean y at speciﬁed x,
617–19
inference about slope of linear,
609–13
misleading, reasons for, 171–76
curvilinear data, 175–76
inappropriately combined groups,
173–74
large sample size, 613
outliers, 172 –73, 178, 179
models, 600 – 605
inferences made with, checking
conditions for, 619–23
multiple linear regression, S3-3–9
regression line for population, 601,
602 –5
regression line for sample, 601,
601–2, 605
multiple, 603, S3-2 –19
95% prediction interval, 614 –16, 617
simple, 158, 599, S3-2
standard deviation for, 605–9
estimating, 606 –7
proportion of variation explained
by x, 607–9
statistical use of word, 165
Regression analysis, 157
purpose of, 171
Regression coefﬁcients, S3-3
estimating, using sample data, S3-5
testing null and alternative hypotheses for, S3-9–10
Index
Regression equation, 157– 65, 599, S3-2
computer results for, 171
deﬁned, 151, 157
example of, 180
extrapolation from, 171–72
writing, 159
Regression line, 157– 65, 600 – 601
deﬁned, 158
equation for, 160 – 62
interpreting, 160 – 61
least squares line as estimate of,
163– 64
for population, 601, 602 –5
deviations from, 603–5
for sample, 601–2, 605
deviations from, 602
slope of, 601, 603, 609–13
sum of squared error (SSE) and, 170,
606, S3-5, S3-8
Rejection region, 523–24
for chi-square tests, 643– 45
for F-test, 675
p-value approach compared to,
560 – 61
for t-tests, 559– 61, 564, 568, 569–70
for z-tests, 523–24
Relationship between categorical variables, 197
Relative frequency, 20
on histograms, 30
simulation approach for estimating
long-run, 259– 61
Relative frequency distribution, 20
Relative frequency probability, 232 –35,
237
estimating, from observed categorical
data, 234 –35
methods of determining, 233–34
proportions and percentages as, 234
simulation approach for estimating
long-run, 259– 61
Relative risk, 198–99, 201, 203, 204
Repeated-measures designs, 130
Replacement
sampling with, 249–50
sampling without, 249–50
Reporting of research results, fair,
S5-16 –20
Representative sample, 706, 720, S5-19.
See also Sampling
Research studies, 117–24
ethics in. See Ethics
evaluating signiﬁcance in, 585– 86
fair reporting of results, S5-16 –20
types of, 117–18. See also Experiment(s); Observational studies
variables, roles played by, 119–23
who is measured, 119
Residual(s), 162 – 63, 602
plots, 619–23
in sample regression equation, S3-5,
S3-9
Residual sum of squares, 606
Residuals versus ﬁts, S3-15–16
Resistance of nonparametric tests to
outliers, S2-2
Resistant statistic, 43
Response bias, 75
possible sources of, in surveys, 95–99
Response rate, 91–92
Response variable(s), 18, 119, 152
in case-control studies, 134, 135
for categorical variables, 21
change in explanatory variable
caused by, 177
in contingency table, 194, 195, 196
in regression models, 600
roles in research studies, 119, 120,
121, 122, 123
transforming, for analysis of variance,
689
Restrictive sample, 706
Retrospective studies, 134, 136, 140 – 41
Rights as research subject, S5-6
Risk, 198–204
baseline, 3, 198, 203
control of societal, statistics for, 709,
713–15
control of societal, using statistics for,
709, 713–15
deﬁned, 3, 198
misleading statistics about, 201–3
odds ratio, 199–200, 201
percent increase/decrease in, 199
relative, 198–99, 201, 203, 204
Robust
inference procedures for regression
as, 621
nonparametric methods as, S2-2
Row percentages, 194, 195, 196, 197,
636, 639
Rule for Concluding Cause and Effect,
137, 176, 706 –7
Rule for Sample Means, 358– 62, 375,
449
Rule for Sample Proportions, 349–53,
375, 382, 414, 417, 515, 516
Rule for using data for inference, 72, 89,
125, 137, 404, 706
Rules for ﬁnding probabilities, 243–51
S
Sagan, Carl, 236
Sample(s). See also Paired data (paired
samples)
cluster, 85– 87
convenience (haphazard), 93–94, 138
deﬁned, 73, 404
dependent, 363
estimating effect size for one and two,
581
765
extending results inappropriately
from, 137–38
independent. See Independent
samples
large, approximate 95% conﬁdence
intervals for, 460 – 61
possible, 349, 358, 360
random. See Random sample/
sampling
regression line for, 601, 601–2, 605
deviations from, 602
representative, 706, 720, S5-19
restrictive, 706
self-selected, 4, 93
simple random, 73, 80 – 83
in action, example of, 103– 6
stratiﬁed, 84 – 85, 86, 89
systematic, 87– 88
volunteer, 4, 93
Sample data, 14 –15
Sample estimate (estimate), 332, 333,
404, 405, 408, 444. See also
Statistic
Sample mean(s), 26, 37, 336, 357– 68.
See also Mean(s)
estimating population mean from,
341
ﬁnding pattern in, 379– 81
histogram of, 345– 46
Normal Curve Approximation Rule
for Sample Means, 358– 62, 375,
449
outliers and, 40
sampling distribution for one,
357– 62, 383
difference in two sample means,
365– 68, 383
of paired differences, 362 – 65, 383
standard deviation of, 359
difference in two sample means,
367– 68
of paired differences, 364 – 65
sample size and, 361– 62
standard error of, 361, 448– 49
difference in sample means,
367– 68, 449–50, 467
of paired differences, 364 – 65,
448– 49
standardized z-statistic for, 370
in two-way ANOVA, notation for, S4-6
Sample proportion(s) (pˆ ), 348–54, 382,
404
estimating population proportion
from single, 339, 353–54
95% margin of error for, 76, 418,
419–23
Normal Curve Approximation Rule
for, 349–53, 375, 382, 414, 417,
515, 516
notation for, 336