Appendix 4b. Least Squares Solution via Elementary Calculus
Tải bản đầy đủ
S TAT I S T I C A L R E L AT I O N S H I P S
195
Global Change, (Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge,
TN, 2005).
M. G. Kendall and A. Stuart, The Advanced Theory of Statistics 2: Inference and Relationship. (London: Griffin, 1961).
R. Summers and A. Heston, “The Penn World Tables (Mark 5): An Expanded Set of International
Comparisons, 1950–1988.” Quarterly Journal of Economics 106 (1991), 327–368.
P. Taylor, “A Pedagogic Application of Multiple Regression Analysis,” Geography 65 (1980), 203–
212.
U.S. Bureau of the Census. Census of Manufactures, various years. Washington, DC.
FURTHER READING
Most elementary statistics textbooks cover the concepts of correlation and regression. Good
introductions are provided by Moore and McCabe (1993) and Freedman et al. (1978). Stanton
(2001) provides an interesting history of the development of these concepts.
D. Freedman, R. Pisani, and R. Purves, Statistics. (New York: Norton, 1978).
D. Moore and G. McCabe, Introduction to the Practice of Statistics, 2nd ed. (New York: Freeman,
1993).
J. Stanton, “Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors,” Journal of Statistics Education 9 (2001), 1–12.
DATASETS USED IN THIS CHAPTER
convergence.html
co2.html
iris.html
smoking.html
temperature.html
PROBLEMS
1. Explain the meaning of the following terms or concepts:
• Scatterplot
• Correlation coefficient
• Covariance
• Coefficient of determination
• Standard error of estimate
• Residual
• Autocorrelation in temporal data
2. What is the difference between correlation and regression?
196
D E S C R I P T I V E S TAT I S T I C S
3. Differentiate between a functional relationship and a statistical relationship.
4. In Figure 4-4 we showed the relationship between levels of GDP and GDP growth for different countries, some members of the OECD and some not. These data are on the website
for the book (convergence.html).
a. Produce separate scatterplots of GDP against GDP growth for OECD nations and for
non-OECD nations.
b. Calculate the correlation coefficients for the data in these two scatterplots.
c. Interpret your results.
5. The smoking and cancer data are on the website (smoking.html). Using the format shown
in Table 4-6, calculate the correlation coefficients between the different forms of cancer and
smoking. Check your results against the matrix of correlation coefficients in the text.
6. Measurements of sound levels and distance from the centerline of an urban expressway are
as follows:
a.
b.
c.
d.
e.
f.
g.
h.
i.
Observation
Distance (ft.)
Sound level (dB)
1
2
3
4
5
6
7
8
9
10
45
63
160
225
305
390
461
515
605
625
83
81
66
68
69
66
58
57
55
61
Draw a scatterplot of the data with distance as the independent variable on the X-axis.
Try and plot a best-fit line representing the relationship between the two variables.
Using the format of Table 4-8, estimate the slope and intercept of the regression equation.
Now fit the real regression line to the scatterplot. How good was your answer to question b?
Interpret the meaning of the regression slope coefficient and the intercept for this problem.
Suppose the distance variable was measured in hundreds of feet rather than feet. How
would the regression equation change? (Hint: No new calculations are required.)
Which observation is most closely predicted by the regression line? least closely?
Calculate the coefficient of determination for this problem. What does this value mean?
Calculate the standard deviation of the dependent variable and the standard deviation of
the independent variable. Use these values, together with the regression slope coefficient,
to estimate the correlation coefficient between your variables. Check this value by finding
the square root of the coefficient of determination.
7. Get the carbon dioxide and atmospheric temperature data (warming.html) from the website
for this book.
a. Which of these variables should be dependent and which independent?
b. Calculate the regression equation for these variables.
S TAT I S T I C A L R E L AT I O N S H I P S
197
c. Interpret the meaning of the regression intercept and slope coefficients.
d. Calculate the standard error of estimate and the coefficient of determination.
e. Find the residuals from this problem. Are there any outliers?
8. In deriving the formula for the coefficient of determination, we stated that Σ yˆ i ei = 0. Show
why this equality holds.
9. Obtain the Dow Jones stock market data from the book website (dow.html). Calculate the
first-order autocorrelation for these data. (Don’t try this by hand!)
III
INFERENTIAL STATISTICS
5
Random Variables
and Probability Distributions
Chapters 2, 3, and 4 presented a series of techniques for describing and summarizing
data. Looking at data, that is, visualizing it in some way, and exploring data with the
aid of summary measures are critical first steps in more sophisticated statistical investigation and inference. In this chapter we develop the link between descriptive and
inferential statistics. That link rests squarely on the concepts of random variables
and probability distributions. Although most people are familiar with notions of probability in games of chance—calling heads or tails in a coin toss and winning or losing
a bet—the fundamental concepts of probability are sometimes difficult to comprehend.
Definitions of statistical experiments, statistical independence, and mutual exclusivity
seem far removed from the context of these games of chance. However, probability
theory lies at the core of statistical inference because all statistical judgments are necessarily probabilistic.
Central to the discussion in this chapter are arguments concerning probability,
random variables, and probability distributions. In Section 5.1 the fundamental concepts of probability are briefly reviewed. In Section 5.2 a random variable is formally
defined and related by example to the process of population sampling. We differentiate between two classes of random variables: those that are discrete and those that
are continuous. Sections 5.3 and 5.4 are devoted to the specification of probability
distribution models for these two classes of random variables. In Section 5.5 we extend our knowledge of random variables and probability distributions to include bivariate random variables and bivariate probability distributions. Bivariate probability
distributions describe the joint distribution of two variables. Section 5.6 offers a brief
summary of the chapter.
5.1. Elementary Probability Theory
Most people have some idea of what probability means. Everyday conversation contains numerous references to it: The chance of rain this holiday weekend is 50%. The
odds are small that a river will flood in a particular year. The likelihood of contracting
201
202
I N F E R E N T I A L S TAT I S T I C S
West Nile virus in a US state is positively related to the probability of contracting the
virus in nearby states. All these statements invoke a notion of probability, though a
somewhat vague one. In this section, we discuss some fundamentals of probability
theory and we show how to rigorously define the term probability as it is used in conventional statistical inference. We also briefly explain the methods used to calculate
the probability of an event occurring. These tasks are critical to the discussion that
follows because the theory of probability provides the logical foundation for statistical inference, the process whereby information from a sample dataset is used to estimate the characteristics of a larger population.
Statistical Experiments, Sample Spaces, and Events
The basic concepts of probability theory rest upon statistical experiment, or random
trial.
DEFINITION: STATISTICAL EXPERIMENT
A statistical experiment, or random trial, is a process or activity in which one
outcome from a set of possible outcomes occurs. Which outcome occurs is not
known with certainty before the experiment takes place.
For example, consider the process of sampling from a population. Each time we select
a single member of a population and record the value of a variable, we are performing a statistical experiment. The values of the variable for different members of the
population represent the possible outcomes in the experiment. Which member of the
population is selected, and therefore what value of the variable of interest is recorded,
is unknown until the experiment is performed. Repeated trials of this experiment would
yield a sample of observations from the population.
DEFINITIONS: ELEMENTARY OUTCOMES AND SAMPLE SPACE
Each different outcome of an experiment is known as an elementary outcome,
and the set of all elementary outcomes constitutes the sample space.
Let us examine the statistical experiment of selecting a card from a playing deck. Each
of the 52 individual cards in the deck represents an elementary outcome. Together the
52 outcomes comprise the sample space of the experiment.
In some cases the elementary outcomes of a statistical experiment can be defined in different ways. Suppose we define an experiment as the toss of two coins. We
could define the elementary outcomes as getting no heads, getting one head, and getting two heads. Alternatively, we might define the elementary outcomes as the complete set of combinations of heads and tails possible from a single toss of the two coins.
In this case we might define the outcomes as {HH}, {HT}, {TH}, and {TT}, all of
which have an equal probability of occurring.
After the elementary outcomes of an experiment have been defined, it is possible
to define collections of elementary outcomes as events.
RANDOM VARIABLES AND PROBABILIT Y DISTRIBUTIONS
203
DEFINITION: EVENT
An event is a subset of the sample space of an experiment, a collection of elementary outcomes.
The particular subset used to define an event can include one or more elementary
outcomes. Returning to the card-drawing experiment, we might define event A as
drawing a spade, event B as drawing an ace, and event C as drawing the ace of spades.
Event A includes 13 elementary outcomes, each of the 13 spades in the playing deck.
Similarly, event B includes four elementary outcomes, and event C includes one elementary outcome. A null event, denoted ∅, is an event that contains no elementary
outcomes: it cannot occur.
Computing Probabilities in Statistical Experiments
The outcome of a statistical experiment will be one, and only one, of the elementary
outcomes of the sample space. To determine the probability that a specific event occurs in a single trial of an experiment, it is necessary to define the likelihood, or the
probability, of each elementary outcome Ei of the experiment occurring. Leaving
aside for the moment the thorny question of how these values are obtained, let us
define three postulates that formalize certain properties of probabilities.
Denote the n elementary outcomes of an experiment as the set S = {E1, E2, . . . ,
En}that defines the sample space, and let the assigned probabilities of these outcomes
be {P(E1), P(E2), . . . , P(En)}.
POSTULATE 1
0 ≤ P(Ei) ≤ 1
for i = 1, 2, . . . , n
(5-1)
For every elementary outcome in the sample space, the assigned probability must be
a non-negative number between zero and one inclusive. There are several important
arguments in this seemingly simple postulate. First, probabilities are non-negative.
Second, if we say an outcome, for example, outcome j, is impossible, we are saying
that P(Ej ) = 0. If we say an outcome, for example, outcome k, is certain to occur, then
we are saying that P(Ek ) = 1.
POSTULATE 2
P(A) = Σ P(Ei)
(5-2)
i∈A
The second postulate states that the probability of any event A occurring is the sum
of the probabilities assigned to the individual elementary outcomes that constitute
event A.
204
I N F E R E N T I A L S TAT I S T I C S
POSTULATE 3
P(S) = 1
P(∅) = 0
(5-3)
The third postulate states that the probability associated with the entire sample space
must equal one, and the probability of a null event must be zero. Although these two
statements are intuitive, it is possible to deduce a number of important results from them.
First, it is easily shown that the sum of the probabilities of all the elementary
outcomes must equal one:
n
Σ P(Ei ) = 1
i=1
(5-4)
This follows directly from Postulates 2 and 3. Since the sample space contains all the
elementary outcomes, that is, S = {E1, E2, . . . , En}, and from Postulate 2 we know
n P(E ), it follows from Postulate 3 that Σ n P(E ) = 1.
that P(S) = Σi=1
i
i=1
i
Second, since any event A must contain a subset of the elementary outcomes of
S, and since from Postulate 1 the probability of an elementary outcome occurring is
between zero and one, it follows that for any event A
0 ≤ P(A) ≤ 1
(5-5)
In other words, every event has a probability of occurring that lies between zero
and one.
Third, if events A and B are mutually exclusive, meaning that no elementary
outcomes are common to both events, then from Postulate 3
P(A ∩ B) = 0
(5-6)
where ∩ denotes the intersection of A and B, the set of elementary outcomes that belong to both events. This follows from the fact that the intersection of two mutually
exclusive events must be the null set: A ∩ B = ∅. For example, counting the spots
after rolling a die gives a number 1, 2, . . . , 6. All of these numbers are mutually excusive events. According to Equation 5-6, there is no chance of observing a 2 and a
5 on the same roll.
Definitions of Probability
The three probability postulates tell us that probability is a measure that, for any elementary outcome or event, is a number between zero and one. But where do probability values come from, and how are they to be interpreted? It turns out that there are
different ways of determining probabilities and thus different ways of interpreting
what they represent. A simple distinction can be made between objective and subjective interpretations of probability. According to the objective view, different individuals will use the rules of probability to assign the same probabilities to the elementary
RANDOM VARIABLES AND PROBABILIT Y DISTRIBUTIONS
205
outcomes of a particular experiment. According to the subjective view, estimates of
the likelihood of a particular event occurring in the trial of an experiment will vary
from person to person depending on their individual assessment of the relevant evidence at that particular moment in time. Let us briefly consider these alternative ways
of understanding the meaning of probability.
There are two methods of finding probabilities according to the objective interpretation of probability. The first of these, the classical view of probability, is usually
associated with games of chance. For many such games, each elementary outcome
has the same probability of occurrence. In these games, we can use arguments of symmetry or geometry to deduce the probability of a given event occurring. Accordingly,
in an experiment with n equally likely outcomes, the probability of occurrence of any
one elementary outcome is 1/n. More generally, if event A is consistent with m elementary outcomes and the sample space comprises n equally likely outcomes,
P(A) = m/n
(5-7)
For example, the sample space S for the card selection experiment consists of n = 52
elementary outcomes. If we define event A to be drawing a diamond, then, with m =
13, the probability of event A occurring is 13/52 = 1/4. Similarly, the probability of
drawing an ace is 4/52 = 1/13, since m = 4 and n = 52.
If Equation 5-7 cannot be used to calculate event probabilities, then other, more
time-consuming methods must be used. First, it is necessary to identify and count all
of the elementary outcomes in the sample space. Then, the subset of the sample space
that comprises the event space must be specified. Finally, the addition of the probabilities of the elementary outcomes in the event space will yield the event probability.
The simplification of this operation by using Equation 5-7 is obvious, particularly
when there may be thousands or even millions of elementary outcomes in the sample
space and/or event space. In fact, it may even be difficult to generate a list of the
elementary outcomes in a sample space. In such cases we often use counting rules to
compute both m and n. These counting rules are shown in Appendix 5a.
The second objective view of probability is the relative frequency interpretation,
which is typically associated with prominent statisticians such as Pearson and Fisher.
Unlike the deductive logic applied to find probabilities in the classical model, supporters of the relative frequency position adopt an inductive approach to probability
determination, whereby the probability of an event is given by the relative number of
times that event occurs in a large number of trials. This definition can be understood
more formally in the following way.
DEFINITION: RELATIVE FREQUENCY INTERPRETATION
OF PROBABILITY
Let event A be defined in some experiment. If the experiment is repeated N
times and event A occurs in n of those trials, then the relative frequency of
event A is n/N. The probability of event A occurring is P(A) = n/N as N → ∞,
that is, where the number of trials of the experiment approaches infinity (becomes extremely large).