Tải bản đầy đủ
3 Estimation of a Mean, Variance, and Proportion

3 Estimation of a Mean, Variance, and Proportion

Tải bản đầy đủ

7.3 Estimation of a Mean, Variance, and Proportion

241

The following is true for any distribution in the population as long as
E X i = µ and Var (X i ) = σ2 exist:

E X = µ,

Var (X ) =

σ2

n

.

(7.2)

The above equations are a direct consequence of independence in a sample
and imply that X is an unbiased and consistent estimator of µ.
If, in addition, we assume normality X i ∼ N (µ, σ2 ), then the sampling distribution of X is known exactly,
X ∼ N µ,

σ2

n

,

and the relations in (7.2) are apparent.
Chebyshev’s Inequality and Strong Law of Large Numbers*. There
are two general results in probability that theoretically justify the use of the
sample mean X to estimate the population mean, µ. These are Chebyshev’s
inequality and strong law of large numbers (SLLN). We will discuss these
results without mathematical rigor.
The Chebyshev inequality states that when X 1 , X 2 , . . . , X n are i.i.d. random variables with mean µ and finite variance σ2 , the probability that X will
deviate from µ is small,
P(| X n − µ| ≥ ) ≤

σ2

n

2

,

for any > 0. The inequality is a direct consequence of (5.8) with (X n − µ)2 in
place of X and 2 in place of a.
To translate this to specific numbers, choose small, say 0.000001. Assume
that the X i s have a variance of 1. The Chebyshev inequality states that with
n larger than the solution of 1/(n × 0.00000012 ) = 0.9999, the distance between
X n and µ will be smaller than 0.000001 with a probability of 99.99%. Admittedly, n here is an experimentally unfeasible number; however, for any small
, finite σ2 , and “confidence” close to 1, such n is finite.
The laws of large numbers state that, as a numerical sequence, X n converges to µ. Care is needed here. The sequence X n is not a sequence of numbers but a sequence of random variables, which are functions defined on sample spaces S . Thus, direct application of a “calculus” type of convergence is
not appropriate. However, for any fixed realization from sample space S , the
sequence X n becomes numerical and a traditional convergence can be stated.
Thus, a correct statement for the so-called SLLN is

242

7 Point and Interval Estimators

P(X n → µ) = 1,

that is, conceived as an event, X n → µ is a sure event – it happens with a
probability of 1.

7.3.2 Point Estimation of Variance
We will obtain an intuition starting, once again, with a finite population:
1
1
N
N
2
y1 , . . . , yN . The population variance is σ2 = N
i =1 (yi − µ) , where µ = N
i =1 yi
is the population mean.
If a sample X 1 , X 2 , . . . , X n is observed, an estimator of σ2 is
σˆ 2 =

for µ known, or
σˆ 2 = s2 =

1 n
(X i − µ)2 ,
n i=1
1 n
(X i − X )2 ,
n − 1 i=1

for µ not known and estimated by X .
In the expression for s2 we divide by n − 1 instead of the “expected” n in
order to ensure the unbiasedness of s2 , E s2 = σ2 . The proof is easy and does not
require any distributional assumptions, except that the population variance
σ2 is finite.
Note that by the definition of variance, E(X i − µ)2 = σ2 and E(X − µ)2 = σ2 /n.
(n − 1)s2 =

n

(X i − X )2

i =1
n

=

[(X i − µ) − (X − µ)]2

i =1
n

=

(X i − µ)2 − 2n(X − µ)

i =1
n

=

(X i − µ) + n(X − µ)2

i −1

(X i − µ)2 − n(X − µ)2 , since

i =1

Then,

n

n

(X i − µ) = n(X − µ).
i =1

7.3 Estimation of a Mean, Variance, and Proportion

243

1
E(n − 1)s2
n−1
n
1
=
E[ (X i − µ)2 − n(X − µ)2 ]
n − 1 i =1

E(s2 ) =

1
σ2
(nσ2 − n )
n−1
n
1
2
(n − 1)σ = σ2 .
=
n−1
=

When, in addition, the population is normal N (µ, σ2 ), then

(n − 1)s2
∼ χ2n−1 ,
σ2

2

2

X −X

= ni=1 iσ
has a χ2 distribution with n − 1 degrees
i.e., the statistic (n−σ1)s
2
of freedom (see Eq. 6.3 and related discussion).
For a sample from a normal distribution, unbiasedness of s2 is an easy con2
sequence of the representation s2 ∼ nσ−1 χ2n−1 and Eχ2n−1 = (n − 1). The variance
of s2 is
Var s2 =

σ2

n−1

2

× Var χ2n−1 =

2σ 4
n−1

(7.3)

since Var χ2n−1 = 2(n − 1). Unlike the unbiasedness result, E s2 = σ2 , which does
not require a normality assumption, the result in (7.3) is valid only when observations come from a normal distribution.
Figure 7.3 indicates that the empirical distribution of normalized sample
variances is close to a χ2 distribution. We generated M = 100000 samples of
size n = 8 from a normal N (0, 52 ) distribution and found sample variances s2
for each sample. The sample variances are multiplied by n − 1 = 7 and divided
by σ2 = 25. The histogram of such rescaled sample variances is plotted and the
density of a χ2 distribution with 7 degrees of freedom is superimposed in red.
The code generating Fig. 7.3 is given next.
M=100000; n = 8;
X = 5 * randn([n, M]);
ch2 = (n-1) * var(X)/25;
histn(ch2,0,0.4,30)
hold on
plot( (0:0.1:30), chi2pdf((0:0.1:30), n-1),’r-’)

244

7 Point and Interval Estimators

0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
−5

0

5

10

15

20

25

30

35

Fig. 7.3 Histogram of normalized sample variances (n − 1)s2 /σ2 obtained from M = 100000
independent samples from N (0, 52 ), each of size n = 8. The density of a χ2 distribution with
7 degrees of freedom is superimposed in red.

The code is quite efficient since a for-end loop is avoided. The simulated
object X is an n × M matrix consisting of M columns (samples) of length n. The
operator var(X) acts on columns of X producing M sample variances.
Several Robust Estimators of the Standard Deviation*. Suppose
that a sample X 1 , . . . , X n is observed and normality is not assumed. We discuss two estimators of the standard deviation that are calibrated by the normal distribution but quite robust with respect to outliers and deviations from
normality.
Gini’s mean difference is defined as
G=

2
| X i − X j |.
n(n − 1) 1≤ i< j≤n

π

The statistic G 2 is an estimator of the standard deviation and is more robust
to outliers than the standard statistic s.
Another proposal by Croux and Rousseeuw (1992) involves absolute differences as in Gini’s mean difference estimator but uses kth-order statistic rather
than the average. The estimator of σ is
Q = 2.2219 {| X i − X j |, i < j }(k) , where k =

n/2 + 1
.
2

7.3 Estimation of a Mean, Variance, and Proportion

245

The constant 2.2219 is needed for the calibration of the estimator, so that if the
sample is a standard normal, then Q = 1. In calculating Q, all n2 differences
| X i − X j | are ordered, and the kth in rank is selected and multiplied by 2.2219.
This choice of k requires an additional multiplicative correction factor n/(n +
1.4) for n odd, or n/(n + 3.8) for n even.
MATLAB scripts
ginimd.m and crouxrouss.m evaluate the estimator.
The algorithm is naïve and uses a double loop to evaluate G and Q. The
evaluation breaks down for sample sizes of less than 500 because of memory
problems. A smarter algorithm that avoids looping is implemented in versions
ginimd2.m and crouxrouss2.m. In these versions, the sample size can go up to
6000.
In the following MATLAB session we demonstrate the performance of robust estimators of the standard deviation. If 1000 standard normal random
variates are generated and one value is replaced with a clear outlier, say
X 1000 = 20, we will explore the influence of this outlier to estimators of the
standard deviation. Note that s is quite sensitive and the outlier inflates the
estimator by almost 20%. The robust estimators are affected as well, but not
as much as s.
x =randn(1, 1000);
x(1000)=20;
std(x)
% ans = 1.1999
s1 = ginimd2(x)
%s1 =1.0555
s2 = crouxrouss2(x)
%s2 =1.0287
iqr(x)/1.349
%ans = 1.0172

There are many other robust estimators of the variance/standard deviation. Good references containing extensive material on robust estimation are
Wilcox (2005) and Staudte and Sheater (1990).

7.3.3 Point Estimation of Population Proportion
It is natural to estimate the population proportion p by a sample proportion.
The sample proportion is the MLE and moment-matching estimator for p.
Sample proportions use a binomial distribution as the theoretical model.
Let X ∼ B in(n, p), where parameter p is unknown. The MLE of p based on a
single observation X is obtained by maximizing the likelihood
n X
p (1 − p)n− X
X

246

7 Point and Interval Estimators

or the log-likelihood
factor free of p + X log(p) + (n − X ) log(1 − p).
The maximum is obtained by solving
(factor free of p + X log(p) + (n − X ) log(1 − p)) = 0
X n− X

= 0,
p
1− p
which after some algebra gives the solution pˆ ml e = Xn .
In Example 7.6 we argued that the exact distribution for X /n is a rescaled
binomial and that the statistic is unbiased, with the variance converging to
0 when the sample size increases. These two properties define a consistent
estimator.

7.4 Confidence Intervals
Whenever the sampling distribution of a point estimator θˆ n is continuous, then
necessarily P(θˆ n = θ ) = 0. In other words, the probability that the estimator
exactly matches the parameter it estimates is 0.
Instead of the point estimator, one may report two estimators, L = L(X 1 , . . . ,
X n ) and U = U(X 1 , . . . , X n ), so that the interval [L,U] covers θ with a probability of 1 − α, for small α. In this case, the interval [L,U] will be called a
(1 − α)100% confidence interval for θ .
For the construction of a confidence interval for a parameter, one needs to
know the sampling distribution of the associated point estimator. The lower
and upper interval bounds L and U depend on the quantiles of this distribution. We will derive the confidence interval for the normal mean, normal variance, population proportion, and Poisson rate. Many other confidence intervals, including differences, ratios, and some functions of statistics, are tightly
connected to testing methodology and will be discussed in subsequent chapters.
Note that when the population is normal and X 1 , . . . , X n is observed, the
exact sampling distributions of
Z=
t=

X −µ
σ/ n

X −µ
s/ n

and
=

X −µ
σ/ n

×

1
(n−1)s2
/(n − 1)
σ2

7.4 Confidence Intervals

247

are standard normal and Student t n−1 , respectively.
The expression for t is shown as a product to emphasize the construction of a
t-distribution from a standard normal (in blue) and χ2 (in red), as in p. 208).
When the population is not normal but n is large, both statistics Z and t
have an approximate standard normal distribution due to the CLT.
Wa saw that the point estimator for the population proportion (of “successes”) is the sample proportion pˆ = X /n, where X is the number of successes
in n trials. The statistic X /n is based on a binomial sampling scheme in which
X has exactly a binomial B in(n, p) distribution. Using this exact distribution
would lead to confidence intervals in which the bounds and confidence levels
were discretized. The normal approximation to the binomial (CLT in the form
of de Moivre’s approximation) leads to


ap prox



N p,

p(1 − p)
,
n

(7.4)

and the confidence intervals for the population proportion p would be based
on normal quantiles.

7.4.1 Confidence Intervals for the Normal Mean
Let X 1 , . . . , X n be a sample from a N (µ, σ2 ) distribution where the parameter
µ is to be estimated and σ2 is known.
Starting from the identity
P(− z1−α/2 ≤ Z ≤ z1−α/2 ) = 1 − α
2

and the fact that X has a N (µ, σn ) distribution, we can write
P − z1−α/2

σ

n

+ µ ≤ X ≤ z1−α/2

σ

n

+ µ = 1 − α;

see Fig. 7.4a for an illustration. Simple algebra gives
X − z1−α/2

σ

n

≤ µ ≤ X + z1−α/2

σ

n

,

(7.5)

which is a (1 − α)100% confidence interval.
If σ2 is not known, then a confidence interval with the sample standard
deviation s in place of σ can be used. The z quantiles are valid for large n, but
for small n (n < 40) we use t n−1 quantiles, since the sampling distribution for
X −µ
s/ n

is t n−1 . Thus, for σ2 unknown,

248

7 Point and Interval Estimators

X − t n−1,1−α/2

s
n

≤ µ ≤ X + t n−1,1−α/2

s

(7.6)

n

is the confidence interval for µ of level 1 − α.

¯ ∼N
X

µ,

¯ −µ
X
√ ∼ tn−1
s/ n

σ2
n

1−α

1−α

µ
σ
µ − z1−α/2 √
n

µ
σ
µ + z1−α/2 √
n

s
µ − tn−1,1−α/2 √
n

s
µ + tn−1,1−α/2 √
n

(a)

(b)

2

2

Fig. 7.4 (a) When σ is known, X has a normal N (µ, σ /n) distribution and P(µ − z1−α/2
X
is

≤ µ + z1−α/2 σn ) = 1 − α,
X −µ
used instead, then s/ n

σ
n



2

leading to confidence interval (7.5). (b) If σ is not known and s2
is t n−1 , leading to the confidence interval in (7.6).

Below is a summary of the above-stated intervals.
The (1 − α) 100% confidence interval for an unknown normal mean µ on
the basis of a sample of size n is
X − z1−α/2

σ

n

, X + z1−α/2

σ

n

when the variance σ2 is known and
X − t n−1,1−α/2

s
n

, X + t n−1,1−α/2

s
n

when the variance σ2 is not known and s2 is used instead.

Interpretation of Confidence Intervals. What does a “confidence of
95%” mean? A common misconception is that it means that the unknown mean
falls in the calculated interval with a probability of 0.95. Such a probability
statement is valid for credible sets in the Bayesian context, which will be discussed in Chap. 8.
The interpretation of the (1 − α) 100% confidence interval is as follows.
If a random sample from a normal population is selected a large number of

7.4 Confidence Intervals

249

times and the confidence interval for the population mean µ is calculated, the
proportion of such intervals covering µ approaches 1 − α.
The following MATLAB code illustrates this. The code samples M = 10000
times a random sample of size n = 40 from a normal population with a mean
of µ = 10 and a variance of σ2 = 42 and calculates a 95% confidence interval.
It then counts how many of the intervals cover the mean µ, cover = 1, and, finally, finds their proportion, covers/M. The code was run consecutively several
times and the following empirical confidences were obtained: 0.9461, 0.9484,
0.9469, 0.9487, 0.9502, 0.9482, 0.9502, 0.9482, 0.9530, 0.9517, 0.9503, 0.9514,
0.9496, 0.9515, etc., clearly scattering around 0.95. Figure 7.5a plots the behavior of the coverage proportion when simulations range from 1 to 10,000.
Figure 7.5b plots the first 100 intervals in the simulation and their position
with respect to µ = 10. The confidence intervals in simulations 17, 37, 47, 58,
78, and 82 fail to cover µ.
M=10000;
%simulate M times
n = 40;
% sample size
alpha = 0.05;
%1-alpha = confidence
tquantile = tinv(1-alpha/2, n-1);
covers =[];
for i = 1:M
X = 10 + 4*randn(1,n); %sample, mean=10, var =16
xbar = mean(X); s = std(X);
LB = xbar - tquantile * s/sqrt(n);
UB = xbar + tquantile * s/sqrt(n);
% cover=1 if the interval covers population mean 10
if UB < 10 | LB > 10
cover = 0;
else
cover = 1;
end
covers =[covers cover]; %saves cover history
end
sum(covers)/M %proportion of intervals covering the mean

7.4.2 Confidence Interval for the Normal Variance
2

Earlier (p. 209) we argued that the sampling distribution of (n−σ1)s
was χ2
2
with n − 1 degrees of freedom. From the definition of χ2n−1 quantiles,
1 − α = P(χ2n−1,α/2 ≤ χ2n−1 ≤ χ2n−1,1−α/2 ),
as in Fig. 7.6. Replacing χ2n−1 with

(n−1)s2
,
σ2

we get

250

7 Point and Interval Estimators

1

14
13

0.99

12

0.98

11
10

0.97

9

0.96

8
7

0.95
0.94
0

2000

4000

6000

8000

10000

6
0

20

40
60
Simulation number

(a)

80

100

(b)

Fig. 7.5 (a) Proportion of intervals covering the mean plotted against the iteration number,
as in plot(cumsum(covers)./(1:length(covers)) ). (b) First 100 simulated intervals.
The intervals 17, 37, 47, 58, 78, and 82 fail to cover the true mean.

1 − α = P χ2n−1,α/2 ≤

(n − 1)s2
≤ χ2n−1,1−α/2 .
σ2

1−α
α/2

0

χ2n−1,α/2

α/2

χ2n−1,1−α/2

Fig. 7.6 Confidence interval for normal variance σ2 is derived from P(χ2n−1,α/2 ≤ (n −
1)s2 /σ2 ≤ χ2n−1,1−α/2 ) = 1 − α.

Simple algebra with the above inequalities (taking the reciprocal of all
three parts, being careful about the direction of the inequalities, and multiplying everything by (n − 1)s2 ) gives

7.4 Confidence Intervals

251

(n − 1)s2
χ2n−1,1−α/2

≤ σ2 ≤

(n − 1)s2
χ2n−1,α/2

.

The (1 − α) 100% confidence interval for an unknown normal variance is
(n − 1)s2
χ2n−1,1−α/2

,

(n − 1)s2
χ2n−1,α/2

.

(7.7)

Remark. If the population mean µ is known, then s2 is calculated as
1 n
2
2
n i =1 (X i − µ) , and the χ quantiles gain one degree of freedom (n instead
of n − 1). This makes the confidence interval a bit tighter.
Example 7.8. Amanita muscaria. With its bright red, sometimes dinnerplate-sized caps, the fly agaric (Amanita muscaria) is one of the most striking
of all mushrooms (Fig. 7.7a). The white warts that adorn the cap, the white
gills, a well-developed ring, and the distinctive volva of concentric rings distinguish the fly agaric from all other red mushrooms. The spores of the mushroom print white, are elliptical, and have a (maximal) diameter in the range
of 7 to 13 µm (Fig. 7.7b).

(a)

(b)

Fig. 7.7 Amanita muscaria and its spores. (a) Fly agaric or Amanita muscaria. (b) Spores
of Amanita muscaria.

Measurements of the diameter X of spores for n = 51 mushrooms are given
in the following table:
10
11
9
10
8
9

11
13
10
8
10

12
9
7
7
10

9
10
11
11
8

10
9
8
12
9

11
10
9
10
10

13
8
11
9
13

12
12
11
10
9

10
10
10
11
12

11
11
12
10
9