Tải bản đầy đủ

A.12 The Box–Cox Method for Transformations

A.12

313

the box–cox method for transformations

Many statistical packages will have routines that will provide a graph of

RSS(λ) versus λ, or of (n/2) log(RSS(λ)) versus λ as shown in Figure 8.7, for

the highway accident data. Equation (A.40) shows that the confidence interval

for λ includes all values of λ for which the log-likelihood is within 1.92 units

of the maximum value of the log-likelihood, or between the two vertical lines

in the figure.

A.12.2 Multivariate Case

Although the material in this section uses more mathematical statistics than

most of this book, it is included because the details of computing the multivariate extension of Box–Cox transformations are not published elsewhere. The

basic idea was proposed by Velilla (1993).

Suppose X is a set of p variables we wish to transform and define

ψ M ( X , l ) = (ψ M ( X 1, λ1 ), … , ψ M ( X k , λ k ))

We have used the modified power transformations (8.5) for each element of

X, but the same general idea can be applied using other transformations such

as the Yeo–Johnson family introduced in Section 8.4. In analogy to the univariate case, we assume that for some λ, we will have

ψ M ( X , l ) ~ N( m , V)

where V is an unknown positive definite symmetric matrix that needs to be

estimated. If xi is the observed value of X for the ith observation, then the

likelihood function is given by

n

L( m , V, l|X ) =

1

∏ (2π V )

1/2

i =1

1

× exp − (ψ M (x i, l ) − m )′ V −1 (ψ M (x i, l ) − m )

2

(A.41)

where |V| is the determinant.3 After rearranging terms, the log-likelihood is

given by

n

n

log(L( m , V, l|X )) = − log(2π ) − log ( V )

2

2

−

3

1

2

n

∑

V −1 (ψ M (x i, l ) − m )(ψ M (x i, l ) − m )′

i =1

The determinant is defined in any linear algebra textbook.

(A.42)

314

appendix

If we fix λ, then (A.42) is the standard log-likelihood for the multivariate

normal distribution. The values of V and μ that maximize (A.42) are the

sample mean and sample covariance matrix, the latter with divisor n rather

than n − 1,

1

m(l ) =

n

V( l ) =

1

n

n

∑ψ

M

(x i , l )

i =1

n

∑ (ψ

M

(x i, l ) − m ( l ))(ψ M (x i, l ) − m ( l ))′

i =1

Substituting these estimates into (A.42) gives the profile log-likelihood

for λ,

n

n

n

log(L( m ( l ), V( l ), l|X )) = − log(2π ) − log ( V( l ) ) −

2

2

2

(A.43)

This equation will be maximized by minimizing the determinant of V(λ) over

values of λ. This is a numerical problem for which there is no closed-form

solution, but it can be solved using a general-purpose function minimizer.

Standard theory for maximum likelihood estimates can provide tests concerning λ and standard errors for the elements of λ. To test the hypothesis that

λ = λ0 against a general alternative, compute

G 2 = 2 log(L(m (lˆ ), V(lˆ ), lˆ )) − log(L(m (l0 ), V(l0 ), l0 ))

and compare G2 with a chi-squared distribution with p df. The standard error

of lˆ is obtained from the inverse of the expected information matrix evaluated

at lˆ . The expected information for lˆ is just the matrix of second derivatives

of (A.43) with respect to λ evaluated at lˆ . Many optimization routines, such

as optim in R, will return the matrix of estimated second derivatives if

requested; all that is required is inverting this matrix, and then the square roots

of the diagonal elements are the estimated standard errors.

A.13 CASE DELETION IN LINEAR REGRESSION

Suppose X is the n × p′ matrix of regressors with linearly independent columns.

We use the subscript “(i)” to mean “without case i,” so that X(i) is an (n − 1) × p′

matrix. We can compute (X (′i ) X (i ) )−1 from the remarkable formula

(X (′i ) X ( i ) )−1 = (X ′X)−1 +

(X ′X)−1 x i x ′i (X ′X)−1

1 − hii

(A.44)

A.13

315

case deletion in linear regression

where hii = x i′(X ′X)−1 x i is the ith leverage value, a diagonal value from the hat

matrix. This formula was used by Gauss (1821); a history of it and many variations is given by Henderson and Searle (1981). It can be applied to give all the

results that one would want relating multiple linear regression with and

without the ith case. For example,

(X ′X) x i eˆi

bˆ (i ) = bˆ −

1 − hii

−1

(A.45)

Writing ri = eˆi /σˆ 1 − hii , the estimate of variance is

−1

n − p′ − 1

σˆ (2i ) = σˆ 2

n − p′ − ri2

(A.46)

and the studentized residual ti is

n − p′ − 1

ti = ri

n − p′ − ri2

1/ 2

(A.47)

The diagnostic statistics examined in this book were first thought to be

practical because of simple formulas used to obtain various statistics when

cases are deleted that avoided recomputing estimates. Advances in computing

in the last 30 years have made the computational burden of recomputing

without a case much less onerous, and so diagnostic methods equivalent to

those discussed here can be applied to problems other than linear regression

where the updating formulas are not available.

References

In a few instances, the URL given for an article refers to the website http://

dx.doi.org/, used to resolve a digital object identifier (DOI) and send you to the

correct website. This may lead you to a page requesting payment before viewing

an article. Many research libraries subscribe to journals and may use a different

method to resolve a DOI so you can get to articles for free. Ask your librarian, or see

http://doi.org.

An on-line version of this bibliography with clickable links is available on the

website for this book, http://z.umn.edu/alr4ed.

Agresti, A. (2007). An Introduction to Categorical Data Analysis. 2nd ed. Wiley,

Hoboken, NJ.

Agresti, A. (2013). Categorical Data Analysis. 3rd ed. Wiley, Hoboken, NJ.

Allison, P. D. (2001). Missing Data. Quantitative Applications in the Social Sciences.

Sage, Thousand Oaks, CA.

Allison, T. and Cicchetti, D. V. (1976). Sleep in mammals: Ecological and constitutional correlates. Science, 194, 732–734. URL: http://www.jstor.org/stable/1743947.

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27,

17–21. URL: http://www.jstor.org/stable/2682899.

Atkinson, A. C. (1985). Plots, Transformations and Regression: An Introduction to

Graphical Methods of Diagnostic Regression Analysis. Clarendon Press, Oxford.

Baes, C. and Kellogg, H. (1953). Effects of dissolved sulphur on the surface tension

of liquid copper. Journal of Metals, 5, 643–648.

Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. 3rd ed. Wiley, Hoboken,

NJ.

Bates, D. and Watts, D. (1988). Nonlinear Regression Analysis and Its Applications.

Wiley, Hoboken, NJ.

Beckman, R. J. and Cook, R. D. (1983). Outliers. Technometrics, 25, 119–149. URL:

http://www.jstor.org/stable/1268541.

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A

practical and powerful approach to multiple testing. Journal of the Royal Statistical

Society. Series B (Methodological), 57, 289–300. URL: http://www.jstor.org/stable/

2346101.

Applied Linear Regression, Fourth Edition. Sanford Weisberg.

© 2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.

317

318

references

Bertalanffy, L. (1938). A quantitative theory of organic growth (inquiries on

growth laws II). Human Biology, 10, 181–213. URL: http://www.jstor.org/stable/

41447359.

Berzuini, C., Dawid, P., and Bernardinelli, L. (eds.) (2012). Causality: Statistical Perspectives and Applications. Wiley, Hoboken, NJ.

Bleske-Rechek, A. and Fritsch, A. (2011). Student consensus on ratemyprofessors.

com. Practical Assessment, Research & Evaluation, 16. (Online; last accessed—

August 1, 2013), URL: http://pareonline.net/getvn.asp?v=16&n=18.

Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. Wiley, New

York.

Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data

Analysis: The Kernel Approach with S-Plus Illustrations. Oxford University Press,

Oxford.

Box, G., Jenkins, G., and Reinsel, G. (2008). Time Series Analysis: Forecasting and

Control. 4th ed. Wiley, Hoboken, NJ.

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal

Statistical Society. Series B (Methodological), 26, 211–252. URL: http://www.jstor.org/

stable/2984418.

Bretz, F., Hothorn, T., Westfall, P., and Westfall, P. (2010). Multiple Comparisons

Using R. Chapman & Hall/CRC, Boca Raton, FL.

Breusch, T. S. and Pagan, A. R. (1979). A simple test for heteroscedasticity and random

coefficient variation. Econometrica, 47, 1287–1294. URL: http://www.jstor.org/

stable/1911963.

Brillinger, D. (1983). A generalized linear model with “Gaussian” regressor

variables. A Festschrift for Erich L. Lehmann in Honor of His Sixty-Fifth Birthday,

97–114.

Brown, P. (1993). Measurement, Regression, and Calibration. Oxford Scientific Publications, Oxford.

Burnside, O. C., Wilson, R. G., Weisberg, S., and Hubbard, K. G. (1996). Seed longevity

of 41 weed species buried 17 years in Eastern and Western Nebraska. Weed

Science, 44, 74–86. URL: http://www.jstor.org/stable/4045786.

Burt, C. (1966). The genetic determination of differences in intelligence: A study of

monozygotic twins reared together and apart. British Journal of Psychology, 57,

137–153. URL: http://dx.doi.org/10.1111/j.2044-8295.1966.tb01014.x.

Carpenter, J. and Kenward, M. (2012). Multiple Imputation and Its Application.

Wiley, Hoboken, NJ. (Online; last accessed August 1, 2013), URL: http://

missingdata.lshtm.ac.uk.

Casella, G. and Berger, R. (2001). Statistical Inference. Duxbury Press, Pacific Grove,

CA.

Centers for Disease Control (2013). Youth risk behavior surveillance system. (Online;

last accessed August 1, 2013), URL: http://www.cdc.gov/HealthyYouth/yrbs/

index.htm.

Chen, C.-F. (1983). Score tests for regression models. Journal of the American Statistical

Association, 78, 158–161. URL: http://www.jstor.org/stable/2287123.

Christensen, R. (2011). Plane Answers to Complex Questions: The Theory of Linear

Models. 4th ed. Springer, New York.

references

319

Clapham, A. (1934). English Romanesque Architecture after the Conquest. Clarendon

Press, Oxford.

Clark, R., Henderson, H., Hoggard, G., Ellison, R., and Young, B. (1987). The ability

of biochemical and haematological tests to predict recovery in periparturient

recumbent cows. New Zealand Veterinary Journal, 35, 126–133. URL: http://

dx.doi.org/10.1080/00480169.1987.35410.

Clausius, R. (1850). Über die bewegende Kraft der Wärme und die Gezetze welche

sich daraus für Wärmelehre selbst abeiten lassen. Annelen der Physik, 79,

500–524.

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 829–836. URL: http://

www.jstor.org/stable/2286407.

Cochran, W. (1977). Sampling Techniques. 3rd ed. Wiley, Hoboken, NJ.

Collett, D. (2003). Modelling Binary Data. 2nd ed. Chapman & Hall, Boca Raton, FL.

Colorado Climate Center (2012). Colorado climate center monthly data access.

(Online; last accessed August 1, 2013), URL: http://ccc.atmos.colostate.edu/cgi-bin/

monthlydata.pl.

Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19, 15–18. URL: http://www.jstor.org/stable/1268249.

Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical

Society. Series B (Methodological), 48, 133–169. URL: http://www.jstor.org/

stable/2345711.

Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through

Graphics. Wiley, Hoboken, NJ.

Cook, R. D. and Prescott, P. (1981). On the accuracy of Bonferroni significance levels

for detecting outliers in linear models. Technometrics, 23, 59–63. URL: http://

www.jstor.org/stable/1267976.

Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman

& Hall/CRC, Boca Raton, FL. (Online; last accessed August 1, 2013), URL: http://

conservancy.umn.edu/handle/37076.

Cook, R. D. and Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression.

Biometrika, 70, 1–10. URL: http://www.jstor.org/stable/2335938.

Cook, R. D. and Weisberg, S. (1994). Transforming a response variable for linearity.

Biometrika, 81, 731–737. URL: http://www.jstor.org/stable/2337076.

Cook, R. D. and Weisberg, S. (1999a). Applied Regression Including Computing and

Graphics. Wiley, New York.

Cook, R. D. and Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium

the message? The American Statistician, 53, 29–37. URL: http://www.jstor.org/

stable/2685649.

Cook, R. D. and Witmer, J. A. (1985). A note on parameter-effects curvature. Journal

of the American Statistical Association, 80, 872–878. URL: http://www.jstor.org/

stable/2288546.

Cox, D. (1958). Planning of Experiments. Wiley, Hoboken, NJ.

Cunningham, R. and Heathcote, C. (1989). Estimating a non-Gaussian regression

model with multicollinearity. Australian & New Zealand Journal of Statistics, 31,

12–17.

320

references

Dalal, S. R., Fowlkes, E. B., and Hoadley, B. (1989). Risk analysis of the space shuttle:

Pre-challenger prediction of failure. Journal of the American Statistical Association,

84, 945–957. URL: http://www.jstor.org/stable/2290069.

Daniel, C. and Wood, F. (1980). Fitting Equations to Data: Computer Analysis of Multifactor Data. Wiley, Hoboken, NJ.

Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and Their Application.

Cambridge University Press, Cambridge.

Dawson, R. (1995). The “unusual episode” data revisited. Journal of Statistics Education, 3. URL: http://www.amstat.org/publications/JSE/v3n3/datasets.dawson.html.

de Boor, C. (1978). A Practical Guide to Splines. Springer, New York.

Derrick, A. (1992). Development of the measure-correlate-predict strategy for site

assessment. In Proceedings of the 14th BWEA Conference. 259–265.

Dodson, S. (1992). Predicting crustacean zooplankton species richness. Limnology and

Oceanography, 37, 848–856. URL: http://www.jstor.org/stable/2837943.

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of

Statistics, 7, 1–26. URL: http://www.jstor.org/stable/2958830.

Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman &

Hall/CRC, Boca Raton, FL.

Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators

for families of linear regressions. The Annals of Mathematical Statistics, 34, 447–

456. URL: http://www.jstor.org/stable/2238390.

Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors.

In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and

Probability, vol. 1. University of California Press, Berkeley, 59–82.

Ezekiel, M. and Fox, K. (1959). Methods of Correlation and Regression Analysis:

Linear and Curvilinear. Wiley, Hoboken, NJ.

Fair Isaac Corporation (2013). myfico. (Online; last accessed August 1, 2013), URL:

http://www.myfico.com/CreditEducation/WhatsInYourScore.aspx.

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and

its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

URL: http://www.jstor.org/stable/3085904.

Federal Highway Administration (2001). Highway statistics 2001. (Online; last

accessed August 1, 2012), URL: http://www.fhwa.dot.gov/ohim/hs01/index.htm.

Finkelstein, M. O. (1980). The judicial reception of multiple regression studies in race

and sex discrimination cases. Columbia Law Review, 80, 737–754. URL: http://

www.jstor.org/stable/1122138.

Fisher, R. and Mackenzie, W. (1923). Studies in crop variation. II. The manurial

response of different potato varieties. The Journal of Agricultural Science, 13,

311–320. URL: http://digital.library.adelaide.edu.au/dspace/bitstream/2440/15179/

1/32.pdf.

Fitzmaurice, G., Laird, N., and Ware, J. (2011). Applied Longitudinal Analysis. 2nd ed.

Wiley, Hoboken, NJ.

Forbes, J. D. (1857). XIV.—Further experiments and remarks on the measure

ment of heights by the boiling point of water. Transactions of the Royal

Society of Edinburgh, 21, 235–243. URL: http://journals.cambridge.org/article

_S0080456800032075.

references

321

Fox, J. (2003). Effect displays in R for generalised linear models. Journal of Statistical

Software, 8, 1–27. URL: http://www.jstatsoft.org/v08/i15.

Fox, J. and Weisberg, S. (2011). An R Companion to Applied Regression. 2nd ed. Sage,

Thousand Oaks, CA. URL: http://z.umn.edu/carbook.

Fraley, C., Raftery, A. E., Gneiting, T., Sloughter, J., and Berrocal, V. J. (2011).

Probabilistic weather forecasting in R. R Journal, 3, 55–63. (Online; last accessed

August 1, 2013), URL: http://journal.r-project.org/archive/2011-1/RJournal_2011-1_

Fraley∼et∼al.pdf.

Freedman, D. and Lane, D. (1983). A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics, 1, 292–298. URL: http://

www.jstor.org/stable/1391660.

Freedman, D. A. (1983). A note on screening regression equations. The American

Statistician, 37, 152–155. URL: http://www.jstor.org/stable/2685877.

Freeman, M. and Tukey, J. (1950). Transformations related to the angular and the

square root. The Annals of Mathematical Statistics, 21, 607–611.

Furnival, G. M., Wilson, J., and Robert W. (1974). Regressions by leaps and bounds.

Technometrics, 16, 499–511. URL: http://www.jstor.org/stable/1267601.

Galton, F. (1877). Typical laws of heredity. Proceedings of the Royal Institution, 8,

282–301. (Online; last accessed August 1, 2013), URL: http://galton.org/essays/18701879/galton-1877-roy-soc-typical-laws-heredity.pdf.

Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of

the Anthropological Institute of Great Britain and Ireland, 15, 246–263. URL: http://

www.jstor.org/stable/2841583.

Gauss, C. (1821). Anzeige: Theoria combinationis observationum erroribus minimis

obnoxiae: Pars prior (theory of the combination of observations which leads to

the smallest errors). Göttingische gelehrte Anzeigen, 33, 321–327. Reprinted by the

Society for Industrial and Applied Mathematics, 1987, URL: http://epubs.siam.org/

doi/pdf/10.1137/1.9781611971248.fm.

Gilstein, C. Z. and Leamer, E. E. (1983). The set of weighted regression estimates.

Journal of the American Statistical Association, 78, 942–948. URL: http://

www.jstor.org/stable/2288208.

Gnanadesikan, R. (1997). Methods for Statistical Data Analysis of Multivariate Observations. 2nd ed. Wiley, Hoboken, NJ.

Goldstein, H. (2010). Multilevel Statistical Models. 4th ed. Wiley, Hoboken, NJ.

Golub, G. and Van Loan, C. (1996). Matrix Computations, vol. 3. Johns Hopkins University Press, Baltimore, MD.

Gould, S. (1966). Allometry and size in ontogeny and phylogeny. Biological Reviews,

41, 587–638. URL: http://dx.doi.org/10.1111/j.1469-185X.1966.tb01624.x.

Gould, S. J. (1973). The shape of things to come. Systematic Zoology, 22, 401–404. URL:

http://www.jstor.org/stable/2412947.

Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear

Models: A Roughness Penalty Approach, vol. 58. Chapman & Hall/CRC, Boca

Raton, FL.

Greene, W. (2003). Econometric Analysis. 5th ed. Prentice Hall, Upper Saddle

River, NJ.

322

references

Haddon, M. and Haddon, M. (2010). Modelling and Quantitative Methods in Fisheries.

Chapman & Hall/CRC, Boca Raton, FL.

Hahn, A. (ed.) (1979). Development and Evolution of Brain Size. Academic Press, New

York.

Hald, A. (1960). Statistical Theory with Engineering Applications. Wiley, Hoboken, NJ.

Hall, P. and Li, K.-C. (1993). On almost linearity of low dimensional projections

from high dimensional data. The Annals of Statistics, 21, 867–889. URL: http://

www.jstor.org/stable/2242265.

Härdle, W. (1990). Applied Nonparametric Regression, vol. 26. Cambridge University

Press, Cambridge, MA.

Hart, C. W. M. (1943). The Hawthorne experiments. The Canadian Journal of Economics and Political Science/Revue Canadienne d’Economique et de Science politique,

9, 150–163. URL: http://www.jstor.org/stable/137416.

Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The Elements of Statistical Learning. 2nd ed. Springer, New York.

Hawkins, D. M. (1980). Identification of Outliers. Chapman & Hall/CRC, Boca Raton,

FL.

Hawkins, D. M., Bradu, D., and Kass, G. V. (1984). Location of several outliers in

multiple-regression data using elemental sets. Technometrics, 26, 197–208. URL:

http://www.jstor.org/stable/1267545.

Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices.

SIAM Review, 23, 53–60. URL: http://www.jstor.org/stable/2029838.

Hernandez, F. and Johnson, R. A. (1980). The large-sample behavior of transformations to normality. Journal of the American Statistical Association, 75, 855–861.

URL: http://www.jstor.org/stable/2287172.

Hilbe, J. M. (2011). Negative Binomial Regression. Cambridge University Press,

Cambridge.

Hinkley, D. (1985). Transformation diagnostics for linear models. Biometrika, 72, 487–

496. URL: http://www.jstor.org/stable/2336721.

Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and

ANOVA. The American Statistician, 32, 17–22. URL: http://www.jstor.org/stable/

2683469.

Hocking, R. (1985). The Analysis of Linear Models. Brooks Cole, Monterey, CA.

Hocking, R. (2003). Methods and Applications of Linear Models: Regression and the

Analysis of Variance. Wiley, Hoboken, NJ.

Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model

averaging: A tutorial. Statistical Science, 14, 382–401. URL: http://www.jstor.org/

stable/2676803.

Hosmer, D. W., Lemeshow, S., and May, S. (2008). Applied Survival Analysis. 2nd ed.

Wiley, Hoboken, NJ.

Hosmer, D. W., Lemeshow, S., and Sturdivant, R. (2013). Applied Logistic Regression.

3rd ed. Wiley, Hoboken, NJ.

Huber, P. (1967). The behavior of maximum likelihood estimates under non-standard

conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 221–33.

references

323

Huber, P. and Ronchetti, E. M. (2009). Robust Statistics. 2nd ed. Wiley, Hoboken, NJ.

Hurvich, C. M. and Tsai, C.-L. (1990). The impact of model selection on inference in

linear regression. The American Statistician, 44, 214–217. URL: http://www.jstor.org/

stable/2685338.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med,

2, e124. URL: http://dx.doi.org/10.1371%2Fjournal.pmed.0020124.

Jevons, W. S. (1868). On the condition of the metallic currency of the United

Kingdom, with reference to the question of international coinage. Journal of

the Statistical Society of London, 31, 426–464. URL: http://www.jstor.org/stable/

2338797.

Johns, M. W. (1991). A new method for measuring daytime sleepiness: The Epworth

sleepiness scale. Sleep, 16, 540–545. URL: http://www.ncbi.nlm.nih.gov/

pubmed/1798888.

Johnson, K. (1996). Unfortunate Emigrants: Narratives of the Donner Party. Utah State

University Press, Logan, UT.

Johnson, M. P. and Raven, P. H. (1973). Species number and endemism: The Galápagos

archipelago revisited. Science, 179, 893–895. URL: http://www.jstor.org/

stable/1735348.

Joiner, B. L. (1981). Lurking variables: Some examples. The American Statistician, 35,

227–233. URL: http://www.jstor.org/stable/2683295.

Kennedy, W. and Gentle, J. (1980). Statistical Computing, vol. 33. CRC, Boca

Raton, FL.

LeBeau, M. (2004). Evaluation of the Intraspecific Effects of a 15-Inch Minimum Size

Limit on Walleye Populations in Northern Wisconsin. PhD thesis, University of

Minnesota.

Lehrer, J. (2010). The truth wears off. The New Yorker, 13. (Online; last accessed

August 1, 2012), URL: http://www.newyorker.com/reporting/2010/12/13/101213fa

_fact_lehrer?currentPage=all.

Lenth, R. V. (2006–2009). Java applets for power and sample size [computer software].

(Online; last accessed August 1, 2013), URL: http://www.stat.uiowa.edu/∼rlenth/

Power.

Lenth, R. V. (2013). lsmeans: Least-squares means. R package version 1.06-05, URL:

http://CRAN.R-project.org/package=lsmeans.

Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. The Annals of

Statistics, 17, 1009–1052. URL: http://www.jstor.org/stable/2241708.

Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. 2nd ed.

Wiley, Hoboken, NJ.

Loader, C. (1999). Local Regression and Likelihood. Springer, New York.

Loader, C. (2004). Smoothing: Local regression techniques. In Handbook of

Computational Statistics: Concepts and Methods. Springer, New York, 539–563.

Lohr, S. (2009). Sampling: Design and Analysis. 2nd ed. Duxbury Press, Pacific

Grove, CA.

Long, J. S. and Ervin, L. H. (2000). Using heteroscedasticity consistent standard

errors in the linear regression model. The American Statistician, 54, 217–224. URL:

http://www.jstor.org/stable/2685594.