IV. PATTERNS IN SPACE AND TIME
Tải bản đầy đủ
14
Spatial Patterns and Relationships
Chapter 2 introduced different types of spatial data. In Chapter 3 we outlined a series
of descriptive statistics used to summarize spatial data, and we highlighted a number
of problems that complicate their use. As geographers we are particularly interested
in the spatial distribution of various phenomena over the surface of the earth and of
the processes that generate them. While many of the statistical concepts discussed so
far can be applied to spatial data, more specialized techniques have been developed
to explore spatial relationships. This chapter reviews some of these more specialized
techniques, focusing on analysis of the spatial distribution of particular objects and
spatial variation in the values assumed by a single variable. The spatial extent of our
analysis is limited: the techniques we explore are typically deployed to investigate the
properties of points lying on the plane, rather than points lying on the sphere or the
ellipsoid.
In Section 14.1 we begin our analysis of spatial data by looking at point patterns and two techniques used to examine them, namely, quadrat analysis and nearest neighbor analysis. In Section 14.2 we discuss spatial autocorrelation, a method
that describes the spatial distribution of values of a single variable. Section 14.3
extends this discussion, examining measures of local spatial association that are frequently used to detect geographical clusters or hot spots of particular activity. Section
14.4 identifies problems that spatial autocorrelation poses for inferential analysis
using regression models, and it shows how these problems might be identified and
solved. We provide a short introduction to geographically weighted regression in Section 14.5, and a brief conclusion in Section 14.6.
14.1. Point Pattern Analysis
Is there a spatial pattern or some form of geographical order in the location of cities,
industrial factories, trees in a forest, earthquake epicenters, outbreaks of a disease, or
the nests of a species of bird? Geographers and other scientists search for geographical patterns or spatial order across a broad range of phenomena, in the hope that this
533
534
PATTERNS IN SPACE AND TIME
FIGURE 14-1. Spatial distribution of rainforest trees, Meliacae (squares)
and Caesalpiniaceae (triangles) in Paracou, French Guiana. Source: Revised from Forget et al. (1999)
will lead to a better understanding of the processes that produced such patterns. We
begin the search for pattern, or spatial relationship, by mapping the locations at which
particular objects are located. The resulting maps, following the discussion of spatial
data types in Chapter 2, are known as point pattern maps.
Figures 14-1 and 14-2 provide two illustrations of point patterns. As usual, our
statistical analysis of point patterns begins with looking at the data. Figure 14-1 shows
the geographical distribution of two rainforest trees in a sample plot from Paracau,
French Guiana. The squares in the figure show the location of Meliacae trees, while
the triangles show the location of Caesalpiniaceae trees. Biogeographers might examine point patterns such as these for evidence suggesting a particular mechanism of
seed dispersal. In fact, both these species of tree have seeds dispersed by rodents. Figure 14-2 reveals the geographical distribution of lung cancer cases across a portion
of Lancashire in northwest England. Epidemiologists and health geographers study
point patterns of health events to gather clues on potential sources of a disease and/or
mechanisms of transmission.
In general, when we examine a point map, we are looking to see whether the
spatial distribution, or the geographical arrangement, of the variable of interest displays
SPATIAL PATTERNS AND REL ATIONSHIPS
535
Incinerator
FIGURE 14-2. Spatial distribution of lung cancers in northwest Lancashire, 1974–1983. Source: Revised from Gatrell et al. (1995).
any sort of pattern. This is not always a straightforward exercise. In Chapter 3, we
showed how to find measures of central tendency and dispersion for sets of points distributed across two dimensions, but how might we describe the absolute or relative
locations of points within a study area? To make this task somewhat easier we typically look to see whether the observations in a point pattern, the locations of the points
themselves, tend to cluster together, whether they are more uniformly distributed, or
whether they appear to be arranged randomly. These different types of point patterns
are illustrated in Figure 14-3. Figure 14-3a displays a clustered arrangement of points,
where the objects of interest are found close to one another and where large areas of
the study region contain no points. Clustered patterns tend to result from a contagion
process where a particular location attracts a number of points. The dispersed point
pattern in Figure 14-3c is commonly thought to result from some form of competition in space where points repel one another. The arrangement of points in the dispersed pattern is quite regular over the study area, and for this reason this arrangement
is also referred to as a uniform spatial distribution.
Random point patterns, like those in Figure 14-3b, result from the operation
of an independent random process (or a process consistent with complete spatial randomness). An independent random process is one in which every location (or small
536
PATTERNS IN SPACE AND TIME
(a) Clustered
(b) Random
(c) Uniform or dispersed
FIGURE 14-3. Different types of point pattern.
area) of a study region has an equal probability of receiving an event or a point, and
one for which the location of an event is independent of the location of all other
events. Random point patterns are of little interest to the geographer because they are
evidence that the underlying generating process has no spatial logic. For this reason,
when we examine point patterns, we are often engaged in identification and calculation of departures from complete spatial randomness.
Visual comparison of a point pattern map to one of the fundamental point pattern arrangements of Figure 14-3 provides clues to the spatial distribution of a variable of interest and thus to the nature of the process that generated it. More precise
investigation typically involves analysis of the frequency or the density of points across
a study region (quadrat analysis) or of the distance between adjacent points (nearest
neighbor analysis). We briefly review these methods next.
Quadrat Analysis
Quadrat analysis was initially developed by ecologists studying the spatial distribution of plants (see Greig-Smith, 1964). This technique focuses on changes in the
density of points across a study region. The method is operationalized by overlaying
a regular grid on the region of interest and then counting the number of points found
in each quadrat (cell) of the grid. The observed frequency distribution of points per
quadrat is then compared to a theoretical distribution with known properties. If the
observed and theoretical distributions are similar, then we typically infer that the observed distribution could have been generated by a process consistent with the theoretical distribution. Let us now examine this procedure more carefully.
Given a point pattern to analyze, determining the size of the study area is a critical decision. In cases where the process involved is a social one, political boundaries
of one form or another might make an appropriate frame for the study. In other cases,
particularly those involving point patterns generated by physical processes, fixing the
region of interest can be more difficult, often because the processes under consideration may be more or less continuous in space. Must the study region include all points
representing a variable of interest, or should some stray or remote points be excluded
if the analyst believes their inclusion alters the fundamental nature of the point distribution? These are difficult questions to answer. If study areas of different size can be
utilized without excessive time and cost, the researcher can learn a great deal about the
537
SPATIAL PATTERNS AND REL ATIONSHIPS
North–south
scale = 10 units
North–south
scale = 1 unit
East–west scale = 1 unit
(a)
Study area
boundary
East–west scale = 10 units
(b)
(c)
FIGURE 14-4. Study area boundaries.
influence of particular sets of points on the overall point pattern. Some of these issues
are illustrated in Figure 14-4. On the one hand, enlarging the scale of the study area
might make what appears to be a uniform distribution at one scale (Figure 14-4a) appear to be a clustered point pattern at another scale (Figure 14-4b). On the other hand,
too small a study area boundary might truncate (or subset) what appears to be a clustered point distribution and turn it into a more regular spatial distribution (Figure 14-4c).
Once the study region is determined, the researcher has to decide the shape and
size of quadrats. It has become common to use square quadrats (grid squares), though
circular quadrats are also employed. The main advantage of square quadrats is that
they pack together and completely cover an area, whereas circular quadrats leave
some spaces uncovered unless they overlap. Such overlap leads to oversampling parts
of the study area and introduces complex sampling issues. Other regularly shaped
quadrats such as triangles and hexagons may also be used in place of squares. Figure
14-5 provides examples of a regular square quadrat census design and a circular
quadrat sample applied to crime data covering a portion of the city of Portland, Oregon (www.gis.ci.portland.or. us/maps/police). By definition, the square quadrat census shown in Figure 14-5a covers the entire study region, whereas the circular quadrat
sample of Figure 14-5b covers only a subset of the study area. The circular quadrats
in the sample are positioned by drawing coordinates at random from the study region.
As with all samples, the larger the number of observations, the more faithfully the
sample data will represent the underlying population.
One additional factor complicates quadrat analysis: the appropriate size of the
quadrats themselves. On the one hand, employing only a few, large quadrats means
averaging counts of points that may be found in smaller quadrats. Thus, larger quadrats
tend to smooth the heterogeneity that might exist over a study region. On the other
hand, using relatively small quadrats tends to exacerbate differences in measures of
point density across a spatial point pattern. Greig-Smith (1964) suggests that the optimal quadrat size is given as
2A
—–
n
where A is the area of the study region and n is the number of points, representing the
locations of the variable of interest.
538
PATTERNS IN SPACE AND TIME
(a) Quadrat census
(b) Quadrat sample
FIGURE 14-5. Robbery events in southeast Portland (July 2006–June 2007).
It should be clear from Figures 14-4 and 14-5 that a regular or uniform distribution would be characterized by a relatively similar number of points, or observations, within each quadrat of a study region. In turn, a clustered distribution would
be characterized by relatively few quadrats that contain large numbers of points and
many other quadrats that contain no points. Expressed somewhat differently, the variance in the number of points per quadrat across a study region will approach zero for
a uniform distribution and will approach infinity for a clustered distribution. How
much variation in the number of points per quadrat would we expect to see in a point
pattern that was generated by an independent random process? We discussed this
question in Chapter 5, where we showed that the Poisson probability distribution
SPATIAL PATTERNS AND REL ATIONSHIPS
539
provides the probability that a single quadrat contains a specified number of points in
an experiment where points are distributed over a surface in a process that is consistent with complete spatial randomness. The variance of a Poisson random variable,
in this case the number of points per quadrat, is equal to the expected value of the
Poisson random variable and that is given, in practice, by the average intensity of the
point pattern:
n
λ=—
q
where n is the number of points in the study area and q is the number of quadrats.
This suggests that one way of classifying point patterns as clustered, random,
or uniform is to divide the study area into quadrats and to examine the variance/mean
ratio of the observed frequency distribution of points per quadrat. A random pattern
will have a variance/mean ratio of one since it is described by the Poisson distribution
for which the variance [V(X)] and the mean [E(X)] are equal. Dispersed patterns will
have a variance/mean ratio that tends toward zero, and clustered patterns will have a
variance/mean ratio tending toward infinity. To summarize:
s2/X¯ → ∞:
s2/X¯ = 1:
s2/X¯ → 0:
indicates a clustered point pattern
indicates a random point pattern
indicates a uniform point pattern
EXAMPLE 14-1. Quadrat Analysis. Let us examine how to perform quadrat analysis using the California earthquake data introduced in Chapter 2. Here we ignore the
size of earthquakes and focus solely on their location. Figure 14-6 maps the epicenters
of major earthquakes in California over the last 100 years or so. Quadrats of uniform
size are superimposed over the study region. The number of earthquake epicenters
within each quadrat represents the variable of interest. A quick glance at Figure 14-6
shows a cluster of earthquake epicenters extending from Mendocino on the northern
coast of California into the Pacific Ocean. There is another cluster around Mammoth
Lakes along the eastern Sierra Nevada. In Southern California, epicenters are more
widely distributed. Overall, the pattern appears clustered, with many quadrats containing no earthquake activity.
Next, we calculate the sample mean and variance of the number of earthquake
epicenters across the 35 quadrats that comprise the set of observations. Note that there
are 111 earthquakes distributed across Figure 14-6, although some are difficult to
identify because of the overplotting of some points. We find
q
q
X¯ = Σ Xi /q = 3.17
i=1
and
Σ (Xi – X¯ )2
i=1
s2 = ————— = 41.73
q–1
where q is the number of quadrats, Xi is the number of points in quadrat i, and ΣXi = n.
Thus, the variance/mean ratio for the point pattern in Figure 14-6 is 41.73/3.17 = 13.16.
540
PATTERNS IN SPACE AND TIME
42
40
38
36
34
32
128
126
124
122
120
118
116
114
FIGURE 14-6. Earthquake epicenters and quadrats.
The variance/mean ratio is much larger than one indicating that the spatial distribution
of earthquake epicenters in our sample is clustered.
A more formal goodness-of-fit test of the distribution of earthquake epicenters
across the quadrats in Figure 14-6 can also be performed. If earthquake epicenters are
randomly distributed in space they can be considered to have been generated by a
Poisson process with a variance/mean ratio equal to one. We could then ask, what is
the likelihood that a sample point pattern with a variance/mean ratio equal to 13.16
might have been selected by chance from a population point pattern that was truly
random? Thomas and Huggett (1980) show that the sampling distribution of s2/X¯ about
a Poisson prediction of one is approximated by the Student’s t-distribution when the
number of quadrats is reasonably large, say at least 30. Thus, to answer the question
just posed, we calculate the following test statistic based on the Student’s t-distribution
(s2/X¯ ) – 1 13.16 – 1
t = ———— = ———— = 50.041
√2/(q – 1)
0.243
and where the denominator of this equation represents the standard error of the
variance/mean ratio. From the tabulated values of the t-statistic (see Appendix Table
A-6), with q – 1 degrees of freedom, the probability that a test statistic equal to 50.041
could occur by chance is essentially zero. We can therefore state with a considerable
degree of confidence that the earthquake epicenters are not randomly distributed. Furthermore, because the test statistic is greater than one, we can state that the variable
SPATIAL PATTERNS AND REL ATIONSHIPS
541
FIGURE 14-7. Uniform point patterns from quadrat analysis.
of interest is significantly clustered. A test statistic with a value significantly less than
one would indicate a dispersed or uniform spatial distribution of points.
Nearest Neighbor Analysis
Quadrat analysis is insensitive to the spatial arrangement of points within quadrats.
Thus, markedly different point patterns can give rise to identical frequency distributions of the number of points per quadrat. This issue is illustrated in Figure 14-7 where
two point distributions are shown, each with 8 points in total evenly divided between
the four quadrats. Quadrat analysis would reveal that both patterns are uniform, indicating the need for an alternative technique that takes into account the relationships
between points themselves.
Nearest neighbor analysis provides this alternative technique to point pattern
analysis. Developed by Clark and Evans (1954), nearest neighbor analysis focuses on
the distances between points rather than on the density of points in a study region to
determine whether the observed point pattern is clustered, random or dispersed. To
begin, the distance dij between each pair of points i and j in a point pattern is calculated using Pythagoras’s theorem. For each point i = 1, 2, 3, . . . , n, the closest neighboring point is determined, that is, minj dij. The mean or average of these observed
nearest neighbor points is denoted d¯o. Unfortunately, this statistic cannot be used to
compare point pattern maps because it is measured in the same units as the map. What
is needed is some standard against which the mean observed nearest neighbor distance can be compared. The obvious standard is the mean or expected distance between nearest neighbors in a random point pattern. The mean or expected nearest
neighbor distance for a random point pattern is given by
1
d¯e = ———
2√
⎯⎯⎯
n/A
542
PATTERNS IN SPACE AND TIME
where n is the number of points in the pattern and A is the area of the study region.
By convention, A is defined as the smallest rectangle that encloses all the points.
The ratio of observed to expected nearest neighbor distances, R = d¯o /d¯e , is
known as the nearest neighbor index. The value of R can vary between 0 and 2.15.
A value of R close to zero, that is, when the observed nearest neighbor distances are
relatively small, indicates a clustered point pattern. A value of R close to 2.15 is found
when the observed nearest neighbor distances are relatively large, indicating a dispersed pattern. A value R = 1 is consistent with a random pattern.
EXAMPLE 14-2. Nearest Neighbor Analysis. Figure 14-8 shows the location of
donut stores in a neighborhood of Hamilton, Canada. Retail geographers might be
interested in whether the donut stores are randomly located or whether the point pattern of donut stores is significantly different from random, perhaps indicating that
some nonrandom process might explain their relative location.
Table 14-1 lists the 10 donut stores and the distance of each store to its nearest neighbor in km. The mean observed nearest neighbor distance between donut
stores is
n
12.134
d¯o = Σ dij /n = ——— = 1.213 km
10
i=1
The expected nearest neighbor distance, given a study area of (4 × 4) = 16 km2, based
on the assumption that the point pattern exhibits complete spatial randomness is
1
1
d¯e = ———— = ——– = 0.633 km
2√10/16 1.581
and thus the nearest neighbor index R = 1.213/0.633 = 1.916. This value of R is tending toward its upper limit, indicating that the point pattern is dispersed.
5
J
E
D
4
H
A
3
F
C
2
I
B
G
1
1
2
3
4
FIGURE 14-8. Donut stores in Hamilton, Canada.
5