4 A single character: Species matching and species mismatching
Tải bản đầy đủ - 0trang
278
Common ancestry
When each student in a philosophy class is required to submit an essay on the
meaning of life, and the essays that Smith and Jones submit are word-forword identical, it is possible that they wrote their essays independently, and
it also is possible that the students plagiarized from a common source, say a
document on the Internet (Salmon 1984). Though the separate-origin and
the common-origin hypotheses are both logically consistent with the
observations, the similarity of the two essays is evidence that favors the
hypothesis of common origination. I hope it is clear how this interpretation
of the evidence can be represented in the likelihood framework. The
matching of the essays is extremely improbable if they were written independently but would be much less surprising if they were obtained from a
common source. The same line of reasoning is at work when we reason
from the similarity of two species to their common ancestry. Our present
task is to flesh out this likelihood argument in more detail.
Modus Darwin for dichotomous characters
Reichenbach’s (1956) work on the principle of the common cause provides
a sufficient condition for the common-ancestry hypothesis to have higher
likelihood than the separate ancestry hypothesis,5 relative to the observation that species X and Y both occupy the same character state. To start
with the simplest case, I assume that the similarity in question concerns a
dichotomous variable, whose two states are 0 and 1; the observation is
that X ¼ i and Y ¼ i (where i ¼ 0 or i ¼ 1). There are nine assumptions
that need to be stated. In each case, I’ll state the idea in English and then
I’ll formalize it in terms of one or two numbered propositions that are
expressed in terms of the variables used in Figure 4.6.
The first two assumptions assert that a descendant’s probability of
having a trait depends on the trait of its ancestor not on how many other
descendants that ancestor has:
1ị
5
PrX ẳ i j Z ¼ jÞ ¼ PrðX ¼ i j Z1 ¼ jÞ; for i; j ¼ 0; 1:
I criticized Reichenbach’s (1956) principle in §3.8; it says, recall, that when two event types are
correlated, either the one causes the other, the other causes the one, or the two trace back to a
common cause. Although this principle is too strong, there is something of value in Reichenbach’s
argument. He proved that a common-cause model that is set up in a certain way deductively entails
that the joint effects will be correlated. I will use this result in the present chapter to address a
problem that differs from the one that Reichenbach considered. The problem here involves a
comparative question (does the common-cause hypothesis have higher likelihood than the separatecause hypotheses?), and the explanandum I consider is the similarity of two event tokens, not the
correlation of two event types (Sober 1988).
Common ancestry
Observed species
Hypothetical ancestors
X
Y
Z
(CA)
279
X
Y
Z1
Z2
(SA)
Figure 4.6 The common-ancestry and separate-ancestry hypotheses.
2ị
PrY ẳ i j Z ẳ jị ẳ PrY ẳ i j Z2 ẳ jị; for i; j ẳ 0; 1:
The next assumption is that an ancestor’s prior probability of being in
a given character state does not depend on how many descendants it has,
and if it has just one, its probability of being in a given state does not
depend on which descendant it has:
3ị
PrZ ẳ iị ẳ PrZ1 ẳ iị ¼ PrðZ2 ¼ iÞ; for i ¼ 0; 1:
Assumptions 4 and 5 say that descendants and their ancestors are positively
correlated.6
ð4Þ
PrðX ¼ i j Z ¼ iÞ > PrðX ¼ i j Z ẳ jị for i 6ẳ jị:
5ị
PrY ẳ i j Z ẳ iị > PrY ẳ i j Z ¼ jÞ ðfor i 6¼ jÞ:
This restates the backwards inequality that is a consequence of the
Markov model presented in §3.5. Assumption 6 asserts that two lineages
stemming from a common ancestor evolve independently of each other:
6ị
PrX ẳ i and Y ẳ j j Z ẳ kị
ẳ PrX ẳ i j Z ¼ kÞPrðY ¼ j j Z ¼ kÞ; for i; j; k ¼ 0; 1:
The seventh assumption says that the same independence relation holds
for the descendants of the two ancestors postulated by the separate-ancestry
hypothesis:
6
Notice that assumptions (4) and (5) do not assert that the process leading from Z to X and the
process leading from Z to Y are probabilistically equivalent. Different processes can occur in the
two lineages.
Common ancestry
280
7ị
PrX ẳ i and Y ẳ jjZ1 ẳ k & Z2 ẳ lị
ẳ PrX ẳ ijZ1 ẳ kịPrY ẳ jjZ2 ẳ lị; for i; j; k; l ¼ 0; 1:
Assumption 8 says that the two ancestors postulated by the separateancestry hypothesis have character states that are probabilistically independent of each other:
8ị
PrZ1 ẳ i & Z2 ẳ jị ẳ PrZ1 ẳ iị PrZ2 ẳ jị; for i; j ẳ 0; 1:
And the ninth assumption speaks for itself:
ð9Þ
All probabilities have values strictly between 0 and 1:
This exclusion of 0s and 1s as possible probability values is a substantive
postulate; it is not a consequence of the axioms of probability.
These nine assumptions entail that the common-ancestry hypothesis
(CA) has higher likelihood than the separate ancestry hypothesis (SA),
relative to the observation that X ¼ 1 and Y ¼ 1. To see why, let
PrX ẳ 1 j Z ẳ 1ị ¼ x
PrðX ¼ 1 j Z ¼ 0Þ ¼ a
PrðY ¼ 1 j Z ¼ 1Þ ¼ y
PrðY ¼ 1 j Z ẳ 0ị ẳ b:
PrZ ẳ 1ị ẳ p
Assumptions 4 and 5 now can be stated as x > a and y > b. It follows
that
PrðX ¼ 1 and Y ẳ 1 j CAị > PrX ẳ 1 and Y ẳ 1 jSAị
precisely when
pxy ỵ 1 pịab > ẵpx ỵ 1 pị aẵpy ỵ 1 pị b
which simplifies to
ð10Þ
ðx À aÞðy À bÞ > 0:
The same conclusion would follow if the observations were X ¼ 0 and
Y ¼ 0. Assumptions 1–9 therefore suffice to show that an observed
matching favors the CA hypothesis over the SA hypothesis. By parity of
reasoning, a mismatch must favor SA over CA. This is because the probabilities of the different possible observations, given a single hypothesis,
must sum to unity:
Common ancestry
281
PrðX ¼ 1 & Y ẳ 1 j CAị ỵ PrX ẳ 0 & X ẳ 0 j CAị
ỵ PrX ẳ 1 & Y ẳ 0 j CAị ỵ PrX ẳ 0 & Y ẳ 1 j CAị ẳ 1
PrX ẳ 1 & Y ẳ 1 j SAị ỵ PrX ẳ 0 & X ẳ 0 j SAị
ỵ PrX ẳ 1 & Y ẳ 0 j SAị ỵ PrX ẳ 0 & Y ¼ 1 j SAÞ ¼ 1:
For dichotomous characters, matches favor CA over SA and therefore
mismatches favor SA over CA.
If one of the nine conditions just described fails to be true for two
species and a trait they share, does it follow that this similarity is not
evidence favoring CA over SA? The answer depends on the assumption in
question. For example, consider assumption 6, which says that the descendants of a common ancestor evolve independently of each other.
Suppose two lineages evolve in continuing physical contact with each
other so that the characteristics present in one lineage influence which
traits evolve in the other. If so, assumption 6 is false. However, this does
not show that similarity fails to provide evidence for CA in this circumstance, only that the argument just given fails to apply. The situation with
respect to assumption 7, which says that ancestors have their character states
independently of each other, is a bit different.
If there is a perfect correlation between the separate ancestors postulated by the separate-ancestry hypothesis, CA and SA will be evidentially
indistinguishable. A similar judgment attaches to assumption 8, which
says that none of the probabilities involved have values of 0 or 1. Suppose
we violate this assumption by stipulating that the ancestors postulated by
the common-ancestry and the separate-ancestry hypotheses were in
character state 0 (or in character state 1) with probability 1. It then
follows that the two hypotheses have identical likelihoods; the observed
similarity of X and Y does not favor CA over SA, nor does it have the
opposite evidential significance. Within the likelihood framework, it is
essential to think of the evolutionary process probabilistically if we are to
see how similarity can be evidence for CA (Sober 1988).
What about assumption 3: that an ancestor’s probability of having a
given trait is independent of whether that ancestor gives birth to two
descendants or just to one? This is plausible for many traits but not for
traits that affect the probability of speciation events. The literature on
species selection provides a number of hypothetical examples of such traits
(Sober 1984: 355–68; Coyne and Orr 2004: 442–5). For example,
Stanley (1979) describes a hypothetical clade of grasshoppers; some
species have wings while others are wingless. Wingless grasshoppers have
282
Common ancestry
higher probabilities of producing peripheral isolates and thus of giving
birth to new daughter species by allopatric speciation. Characteristics of
this sort can be such that Pr(Z ¼ wingless) 6¼ Pr(Z1 ¼ wingless), thus
violating assumption 3. But once again, the failure of this assumption in
Stanley’s example does not mean that winglessness fails to be evidence of
CA; rather, the proper conclusion is that the argument given above does
not apply.7 Assumption 3 also requires that the two ancestors postulated
by the separate-ancestry hypothesis have the same prior probability of
exhibiting a given character state; cancel that and the simple algebra that
leads to proposition 10 falls by the way.
A different situation arises in connection with assumptions 4 and 5,
which assert that each lineage is such that ancestor and descendant are
positively correlated. Notice how these premises serve to establish the
conclusion, proposition 10. The conclusion would still be true if there
were a negative correlation in both lineages (i.e., x < a and y < b). The idea
that ancestor and descendant are positively correlated is standard in
evolutionary biology; this is pretty much what Darwin meant by his
strong law of heredity: that like tends to beget like. Still, for traits in
which there is no influence of parents on offspring – in which offspring
traits are independent of the traits of parents – an observed similarity will
fail to discriminate between CA and SA. And if there were a positive
correlation in one lineage and a negative correlation in the other, the
similarity of X and Y would favor the SA hypothesis over that of CA.
The idea of competitive exclusion provides a possible scenario in which
similarity favors SA over CA. Suppose that two species that stem from a
common ancestor will probably live in the same locale and will therefore
have a high probability of diverging in their character states; if X exhibits
character state 1, Y will probably exhibit character state 0. Suppose
further that if the two species originate separately, they probably will
live in separate locales and therefore will evolve independently of each
other. In this situation, the observation that X and Y are in the same
state favors SA over CA. This scenario involves a violation of assumption 6 – that descendants stemming from a common ancestor evolve
independently.
Before working through the details of premises 1–9, you may have
been inclined to think that it is obvious that similarity is evidence for
7
Here’s another possible example, this time concerning organismic, not species, genealogies: In a
crowded environment, an organism may have a higher probability of having a second offspring if
the first offspring is small.
Common ancestry
283
common ancestry. Perhaps you also were disposed to ask the following
rhetorical question – who, besides a philosopher, would bother belaboring
the obvious? This dismissive complaint contains a grain of truth – it is a
favorite pastime of philosophers to ask what justification there is behind
the obvious. I hope the examination of premises 1–9 makes it clear why
there is no intrinsic reason why similarity must count as evidence for
common ancestry. Whether this is so depends on nontrivial empirical matters
of fact. Propositions 1–9 are not consequences of the axioms of probability. Neither are they necessary conditions for common ancestry to have
a higher likelihood than separate ancestry, and, for this reason, it would
be wrong to regard them as assumptions that modus Darwin requires;
however, these propositions suffice for similarity to be evidence for
common ancestry, and they have broad applicability.
Homology
My focus has been on how a similarity (or a dissimilarity) that characterizes
a pair of species provides evidence that discriminates between the commonancestry and the separate-ancestry hypotheses. Isn’t this to ignore the
fundamental biological point that it is homologies that provide evidence for
common ancestry? There is a large literature on how the concept of
homology should be understood, but the question at hand in fact has a
simple answer. Homologies are usually taken to be similarities that are
present because of inheritance from a common ancestor; the wings of
sparrows and robins are homologies in this sense. A homoplasy, in contrast,
is a similarity that is not due to inheritance from a common ancestor but
instead arose because independent origination events occurred in separate
lineages; the wings of birds and bats are an example. So defined, the
concept of homology already has built into it the claim of common
ancestry. If our goal is to test the common-ancestry hypothesis against the
separate ancestry hypothesis by looking at data, then it would beg the
question to say that our data consist of ‘‘homologies’’ in this sense.8 What
counts as an observation in this problem must be knowable without one’s
already needing to have an opinion as to which of the competing
hypotheses is true (§2.14). This is why similarities are the right place to
begin.
8
Sober (1988) argues that if synapomorphies are to be evidence for one phylogenetic tree over
another, then the concept of a synapomorphy should not be defined to mean that the trait is a
homology.
284
Common ancestry
Multistate and continuous characters
Conditions 1–9 suffice for similarities to favor CA over SA and for
dissimilarities to have the opposite evidential significance, when the trait
in question is dichotomous. Does the argument extend to the case of
discrete characters that come in more than two states?
There is a logical difference between dichotomous and n-state characters (where n > 2). For a dichotomous trait, if matches are evidence for
CA, it must be true that mismatches are evidence against. The situation
with respect to an n-state discrete character (where n > 2) is more
complicated. If two species are in the same character state, that still is
evidence favoring the CA hypothesis. However, if they differ in character
state, their difference may or may not be evidence against. It all depends
on the nature of the difference. In some cases, their different character
states can actually be evidence for CA. To see this possibility, consider the
number of chromosome pairs a diploid species might possess. If two
species have exactly the same number of chromosomes, this is evidence
that they have a common ancestor, by the argument of the previous
section. But suppose that one species has twenty-three pairs and the other
has twenty-four, as is the case for human beings and chimpanzees,
respectively. What is the evidential significance of this difference?
The way to think about this question is not to subtract 23 from 24 and
focus on the fact that the difference is a small number. Rather, we need to
think about this question from the point of view of the law of likelihood
(§1.3). Are the observed trait values more probable under the commonancestry or the separate-ancestry hypothesis? The answer depends on the
processes that govern the evolution of chromosome number. Consider,
for example, the two transformation series depicted in Figure 4.7. The
transformation series for a character gives the probabilities of different
types of change from one character state to another. Figure 4.7a describes
an n-state character in which all changes have the same probability (u); it
is no harder for a lineage to change from T2 to T10 than it is for it to change
from T9 to T10.9 A different arrangement is depicted in Figure 4.7b; here
there is an ordering of the n states and the probability of changing from one
state to an adjacent state is u; no direct jumping to a nonadjacent state is
possible. With this transformation series, the probability of a lineage’s
changing from T2 to T6 is smaller than its probability of changing from T5
9
Figure 4.7a depicts the Jukes and Cantor (1969) model of nucleotide evolution when n ẳ 4; this
will be discussed in Đ4.8.
Common ancestry
(a)
Ti
u
Tj
(b)
Ti
u
Ti+1
285
Figure 4.7 Two possible transformation series for a trait T that has n states (T1, T2, . . . , Tn).
Each describes the probabilities of changes in character state. In (a), all changes have the
same probability; in (b), there is an ordering of character states and the only changes that are
possible are changes to adjacent states.
to T6. In the transformation series shown in Figure 4.7a, every mismatch
between the species X and Y has the same evidential significance. Since
exact identity of character state is evidence favoring CA over SA, every
difference between the two species is evidence favoring SA over CA. The
transformation series depicted in Figure 4.7b is different. In this case,
two species exhibiting traits T2 and T10 differ more than two species that
exhibit traits T9 and T10. As before, an exact match in character state
favors CA over SA. A very large difference in character state between the two
species is evidence favoring SA over CA. A more modest difference might
have either evidential meaning, depending on further details.
With the distinction between these two transformation series in mind,
let’s return to the example of chromosome number. If every change in
chromosome number had the same probability as every other, then the
fact that chimps have twenty-three chromosomes and humans have
twenty-four would be evidence favoring SA over CA. However, if the
transformation series is the one shown in Figure 4.7b, the near identity of
chromosome number of chimps and humans might be evidence favoring
CA. In contrast, the fact that a subspecies of the fern Ophioglossum
reticulatum has 720 pairs (Stace 2000) whereas females in the ant
Myrmecia pilosula have a single pair while the males are haploid (Gould
1991) is evidence favoring SA over CA if we assume the kind of transformation series depicted in Figure 4.7b and if this difference in chromosome number is the largest one possible. This does not entail that the
two species have no common ancestor, only that their chromosome
number favors that conclusion (if the assumption about the transformation series is true); other traits that they share might favor CA over SA.
It is intuitive that the laws of motion given in Figure 4.7a entail that
every difference between species X and Y favors SA over CA. But what
explains the fact that large differences favor SA over CA while modest
differences can have the opposite significance when the transformation
series is the one given in Figure 4.7b? Suppose our observation is that X ¼ 4
286
Common ancestry
and Y ¼ 5 where the character in question has ten states, and let’s
further suppose that the ancestors depicted in Figure 4.6 have a nonnegligible chance of producing descendants with those character states
only if the ancestors are within two units of their descendants. This is
tantamount to saying that u3 is so small that it can safely be ignored in
comparing likelihoods. If the character we are considering has ten character
states, then the ancestor postulated by the common-ancestry hypothesis
can produce these descendants if and only if it is in one of states 3–6, while
the two ancestors postulated by the separate-ancestry hypothesis can
produce these descendants precisely when the first is in states 2–6 and the
second is in states 3–7. If the ten character states an ancestor might occupy
have equal probability,10 then the probability that the common ancestor is
in the range 3–6 is 0.4, while the probability that each of the separate
ancestors is in range of its respective descendant is (0.5)(0.5) ¼ 0.25. So
CA has the higher likelihood with respect to the observation that X ¼ 4
and Y ¼ 5. But now suppose there is a wider gap between X and Y; for
example, suppose that X ¼ 4 and Y ¼ 8. Then the single ancestor postulated by the common-ancestry hypothesis must be in state 6 if it is to have a
chance of producing these two very different descendants, whereas the two
ancestors postulated by the separate-ancestry hypothesis can yield these
descendants if the first is in one of the states 2–6 while the second is in one
of the states 6–10. With equal probabilities on ancestral character states,
the ancestor postulated by the common-ancestry hypothesis has a 0.1
chance of being in a state that can generate the data, while the two
ancestors postulated by the separate-ancestry hypothesis again have a (0.5)
(0.5) ¼ 0.25 chance of doing so. Now it is SA that makes the observations
more probable. I hope this informal argument makes it intuitive why a
small difference between X and Y can favor CA over SA whereas a larger
difference will have the opposite evidential significance, if the transformation series is the one given in Figure 4.7b.
The two models of evolution shown in Figure 4.7 do not exhaust the
possible transformation series that might be true of an n-state character. A
more complex and realistic model might allow there to be a different
probability for every type of change. And even when larger changes usually
have smaller probabilities than more modest changes, there can be exceptions. Consider, for example, the process of polyploidy, wherein chromosome number doubles or triples or quadruples. In modeling the evidential
10
This prior probability for ancestral character states is the equilibrium distribution for the process
depicted in Figure 4.7b; it is not an additional postulate.
Common ancestry
287
significance of chromosome number, we’d have to take this process into
account as well as the process of adding or deleting a single chromosome
pair. Maybe it is easier to go from twenty to sixty chromosomes than it is to
go from twenty to fifty-nine. This means that if two species have twenty
and sixty pairs respectively, this might be evidence favoring CA over SA,
whereas the opposite conclusion might be correct if they exhibit twenty and
fifty-nine. Once again it is a mistake to focus on how much or how little
two species differ in their character state; what is fundamental is the processes at work in the evolution of their trait values.
These complications should not obscure the essential points. Exact identity
of character state is always evidence favoring CA over SA so long as
assumptions 1–9 from the previous section hold true. And regardless of
whether the character is dichotomous or multistate, some differences in
character state must count as evidence favoring SA over CA. Whether all
differences in character state constitute evidence favoring SA over CA depends
on the transformation series; this is true of the rules represented in Figure
4.7a, but it isn’t true of others. Notice that assumptions 1–9 say nothing
about the transformation series; this means that interpreting differences in
character state for n-state characters requires additional assumptions beyond
those that suffice for interpreting differences in dichotomous traits.
The transformation series represented in Figure 4.7b has no bias;
evolving from one state to another has the same probability as evolving in
the opposite direction. This transformation series is therefore appropriate
for modeling a drift process but not for modeling how selection leads a
lineage to evolve towards a single optimum. In any event, as the number
of character states in this transformation series is increased, and u is made
small, one approaches a model of the evolution of a continuous character
subject to drift. By viewing a continuous trait as the limit of an n-state
trait (where n is made large), you can see how the conclusions stated in
the previous paragraph apply to continuous characters.
We have just seen that n-state characters differ epistemologically from
dichotomous characters. But which are better to use in testing CA against
SA? No sweeping generalization can be expected in answer, but there is a
special case of this question that the principle of total evidence addresses.
Returning to the example of human and chimp chromosome number,
let’s define a dichotomous character W (for ‘‘weak’’) by saying that a
species has W precisely when it has twenty-four or more chromosome pairs.
Human beings lack W but chimps have W, so, by the argument of the
previous section, their difference with respect to this dichotomous character
is evidence against the hypothesis that they have a common ancestor. But
288
Common ancestry
now let’s define a logically stronger feature S that comes in n states;
S1 means that a species has one chromosome pair; S2 means it has two, and
so on. The n-state character S is logically stronger than the dichotomous
character W because the trait value a species has for S logically entails its
value for W, but not conversely. Suppose the laws governing the evolution
of chromosome number have the consequence that humans having
twenty-three and chimps having twenty-four chromosome pairs is evidence in favor of their having a common ancestor. If so, the dichotomous
character state W and the n-state character S point in opposite directions.
Which should we take more seriously? The principle of total evidence says
that we should use the strong description (S), not the weak one (W).
The reader will have noticed a dismaying arbitrariness that enters into
the definition in this example of the dichotomous trait W. I defined the
cutoff between W and notW as twenty-four, with the consequence that
humans and chimps differ with respect to W. But I could just as easily
have defined a dichotomous trait W* that applies to a species when it has
more than fifteen chromosome pairs, with the result that humans and
chimps both have W* . Differing with respect to W favors SA over CA,
but sharing W* has the opposite significance. So which dichotomous trait,
W or W* , should we use to evaluate the common-ancestry and separateancestry hypotheses? This arbitrariness does not arise if we move to the
count property S. However, the price of using this more informative
description of the two species is that we face a new epistemological
problem; we need a substantive theory of how the n-state character evolves
before we can say what the evidential significance is of the observed
difference between the two species.
The paradox of the heap is a philosophical staple that traces back to
ancient Greece; there is no precise number of pebbles that separates a
heap from a nonheap, and no precise number of hairs that separates the
bald from the not bald. These familiar facts do not and should not deter
us from using those concepts, since, after all, there are plenty of clear cases
of heaps and bald people. The present problem is different; it is evidential,
not taxonomic. When a dichotomous character is imposed on an underlying reality of quantitative difference, the dichotomy can make intuitive
sense (as when we say that some people are bald while others are not) or it
can seem utterly arbitrary (as when I invented the character W). But
whether the dichotomy is familiar or not, other dichotomies laid on the
same underlying quantitative reality are logically possible, and so the
question arises of which we should use to describe the evidence. Shifting
from a dichotomous to a multistate or continuous character cuts this