Tải bản đầy đủ
3 Critiques of Meta‑Analysis: When Are They Valid and When Are They Not?

3 Critiques of Meta‑Analysis: When Are They Valid and When Are They Not?

Tải bản đầy đủ



a­ nalyses are not easy. The time required to search adequately for and code
studies is substantial (see Chapters 3–7). The analyses, though not requiring
an especially high level of statistical complexity, must be performed with
care and by someone with the basic skills of meta-­analysis (such as provided
in Chapters 8–11). Finally, the reporting of a meta-­analysis can be especially
difficult given that you are often trying to make broad, authoritative statements about a field (see Chapters 13–14). My intention is not to scare anyone
away from performing a meta-­analysis, but I think it is important to recognize some of the difficulty in this process. However, needing a large amount
of statistical expertise is not one of these difficulties for most meta-­analyses
you will want to perform.
2.3.2 Quantitative Analysis May Lack “Qualitative
Finesse” of Evaluating Literature
Some complain that meta-­analyses lack the “qualitative finesse” of a narrative review, presumably meaning that it fails to make creative, nuanced
conclusions about the literature. I understand this critique, and I agree that
some meta-­analysts can get too caught up in the analyses themselves at the
expense of carefully considering the studies. However, this tendency is certainly not inherent to meta-­analysis, and there is certainly nothing to preclude the meta-­analyst from engaging in this careful consideration.
To place this critique in perspective, I think it is useful to consider
the general approaches of qualitative and quantitative analysis in primary
research. Qualitative research undoubtedly provides rich, nuanced information that has contributed substantially to understanding in nearly all areas of
social sciences. At the same time, scientific progress would be limited if we
did not also rely on quantitative methods and on methods of analyzing these
quantitative data. Few scientists would collect quantifiable data from dozens or hundreds of individuals, but would instead use a method of analysis
consisting of looking at the data and “somehow” drawing conclusions about
central tendency, variability, and co-­occurrences of individual differences.
In sum, there is substantial advantage to conducting primary research using
both qualitative and quantitative analyses, or a combination of both.
Extending this value of qualitative and quantitative analyses in primary
research to the process of research synthesis, I do not see careful, nuanced
consideration of the literature and meta-­analytic techniques to be mutually
exclusive processes. Instead, I recommend that you rely on the advantages
of meta-­analysis in synthesizing vast amounts of information and aiding in
drawing probabilistic inferential conclusions, but also using your knowledge

Questions That Can and Cannot Be Answered through Meta-­Analysis 25

of your field where these quantitative analyses fall short. Furthermore, meta­analytic techniques provide results that are statistically justifiable (e.g., there
is an effect size of a certain range of magnitude; some type of studies provide
larger effect sizes than another type), but it is up to you to connect these findings to relevant theories in your field. In short, a good meta-­analytic review
requires both quantitative methodology and “qualitative finesse.”
2.3.3The “Apples and Oranges” Problem
The critique known as the “apples and oranges problem” was first used as a
critique against Smith and Glass’s (1977) meta-­analytic combination of studies using diverse methods of psychotherapy in treating a wide range of problems among diverse samples of people (see Sharpe, 1997). Critics charge that
including such a diverse range of studies in a meta-­analysis yields meaningless results.
I believe that this critique is applicable only to the extent that the meta­analyst wants to draw conclusions about apples or oranges; if you want to
draw conclusions only about a narrowly defined population of studies (e.g.,
apples), then it is problematic to include studies from a different population (e.g., oranges). However, if you wish to make conclusions about a broad
population of studies, such as all psychotherapy studies of all psychological disorders, then it is appropriate to combine a diverse range of studies.
To extend the analogy: combining apples and oranges is appropriate if you
want to draw conclusions about fruit; in fact, if you want to draw conclusions about fruit you should also include limes, bananas, figs, and berries!
Studies are rarely identical replications of one another, so including studies
that are diverse in methodology, measures, and sample within your meta­analysis has the advantage of improving the generalizability of your conclusions (Rosenthal & DiMatteo, 2001). So, the apples and oranges critique is
not so much a critique about meta-­analysis; rather, it just targets whether or
not the meta-­analyst has considered and sampled studies from an appropriate
level of analysis.
In considering this critique, it is useful to consider the opportunities for
considering multiple levels of analysis through moderator analysis in meta­analysis (see Chapter 9). Evoking the fruit analogy one last time: A meta­analysis can include studies of all fruit and report results about fruit; but
then systematically compare apples, oranges, and other fruit through moderator analyses (i.e., do results involving apples and oranges differ?). Further moderator analyses can go further by comparing studies involving, for
example, McIntosh, Delicious, Fuji, and Granny Smith apples. The possibility



of including diverse studies in your meta-­analysis and then systematically
comparing these studies through moderator analyses means that the apples
and oranges problem is easily addressable.
2.3.4The “File Drawer” Problem
The “file drawer” problem is based on the possibility that the studies included
in a meta-­analysis are not representative of those that have been conducted
because studies that fail to find significant or expected results are hidden
away in researchers’ file drawers. Because I devote an entire chapter to this
problem, also called publication bias, later in this book (Chapter 11), I do
not treat this threat in detail here. Instead, I briefly note that this is indeed
a threat to meta-­analysis, as it is to any literature review. Fortunately, meta­analyses typically use systematic and thorough methods of obtaining studies (Chapter 3) that minimize this threat, and meta-­analytic techniques for
detecting and potentially correcting for this bias exist (Chapter 11).
2.3.5Garbage In, Garbage Out
The critique of “garbage in, garbage out” is that the meta-­analysis of poor
quality primary studies only results in conclusions of poor quality. In many
respects this critique is a valid threat, though there are some exceptions.
First, we can consider what “poor quality” (i.e., garbage) really means. If
studies are described as being of poor quality because they are underpowered
(i.e., have low statistical power to detect the hypothesized effect), then meta­analysis can overcome this limitation by aggregating findings from multiple
underpowered studies to produce a single analysis that is more powerful. If
studies are considered to be of poor quality because they contain artifacts
such as using measures that are less reliable or less valid than is desired, or
if the primary study authors used certain inappropriate analytic techniques
(e.g., artificially dichotomizing continuous variables), then methods of correcting effect sizes might help overcome these problems (see Chapter 6). For
these types of “garbage” then, meta-­analyses might be able to produce high­quality findings.
There are other types of problems of study quality that meta-­analyses
cannot overcome. For instance, if all primary studies evaluating a particular treatment fail to assign participants randomly to conditions, do not use
double-blind procedures, or the like, then these threats to internal validity in
the primary studies will remain when you combine the results across studies in a meta-­analysis. Similarly, if the primary studies included in a meta-

Questions That Can and Cannot Be Answered through Meta-­Analysis 27

a­ nalysis are all concurrent naturalistic designs, then there is no way that
meta-­analytic combination of these results can inform causality. In short, the
design limitations that consistently occur in the primary studies will also be
limitations when you meta-­analytically combine these studies.
Given this threat, some have recommended that meta-­analysts exclude
studies that are of poor study quality, however that might be defined (see
Chapter 4). Although this exclusion does ensure that the conclusions you
reach have the same advantages afforded by good study designs as are available in the primary studies, I think that uncritically following this advice is
misguided for three reasons. First, for some research questions, there may
be so few primary studies that meet strict criteria for “quality” that it is not
very informative to combine or compare them; however, there may be many
more studies that contain some methodological flaws. In these same situations, it seems that the progression of knowledge is unnecessarily delayed by
stubborn unwillingness to consider all available evidence. I believe that most
fields benefit more from an imperfect meta-­analysis than no meta-­analysis
at all, provided that you appropriately describe the limits of the conclusions
of your review. A second reason I think that dogmatically excluding poor
quality studies is a poor choice is that this practice assumes that certain
imperfections of primary studies result in biased effects, yet does not test
this assumption. This leads to the third reason: Meta-­analyses can evaluate
whether systematic differences in effect sizes emerge from certain methodological features. If you code the relevant features of primary studies that are
considered “quality” within your particular field (see Chapter 4), you can
then evaluate whether these features systematically relate to differences in
the results (effect sizes) found among studies through moderator analyses
(Chapter 9). Having done this, you can (1) make statements about how the
differences in specific aspects of quality impact the effect sizes that are found,
which can guide future design of primary studies; (2) where differences are
found, limit conclusions to the types of studies that you believe produce the
most valid results; and (3) where differences are not found, have the advantage of including all relevant studies (versus a priori excluding a potentially
large number of studies).
2.3.6 Are These Problems Relevant Only
to Quantitative Reviews?
Although these critiques were raised primarily against the early meta-­analyses
and have since been raised as challenges primarily against meta-­analytic (i.e.,
quantitative) reviews, most apply to all types of research syntheses. Aside



from the first two I have reviewed (meta-­analyses requiring extensive statistical expertise and lacking in finesse), which I have clarified as being generally
misconceptions, the remainder can be considered as threats to all types of
research syntheses (including narrative research reviews) and often all types
of literature reviews (see Figure 1.1). However, because these critiques have
most often been applied toward meta-­analysis, we have arguably considered
these threats more carefully than have scholars performing other types of literature reviews. It is useful to consider how each of the critiques I described
above threatens both quantitative and other literature reviews (considering
primarily the narrative research review), and how each discipline typically
manages the problem.
The “apples and oranges” problem (i.e., inclusion of diverse types of
studies within a review) is potentially threatening to both narrative and
meta-­analytic review. However, my impression is that meta-­analyses more
commonly attempt to draw generalized conclusions across diverse types of
primary studies, whereas narrative reviews more often draw fragmented conclusions of the form “These types of studies find this. These other types of
studies find this.” If practices stopped there, then the apples and oranges
problem could more fairly be applied to meta-­analyses than other reviews.
However, meta-­analysts usually perform moderator analyses to compare
the diverse types of studies, and narrative reviews often try to draw synthesized conclusions about the diverse types of studies. Given that both types
of reviews typically attempt to draw conclusions at multiple levels (i.e., about
fruits in general and about apples and oranges in particular), the critique of
focusing on the “wrong” level of generalization—if there is such a thing, versus just focusing on a different level of generalization than another scholar
might choose—is equally applicable to both. However, both the process of
drawing generalizations across diverse studies and the process of comparing
diverse types of studies are more objective and lead to more accurate conclusions (Cooper & Rosenthal, 1980) when performed using meta-­analytic
versus narrative review techniques.
The “file drawer” problem—the threat of unpublished studies not being
included in a review, and the resultant available studies being a biased representation of the literature—is a threat to all attempts to draw conclusions
from this literature. In other words, if the available literature is biased, then
this bias affects any attempt to draw conclusions from the literature, narrative or meta-­analytic. However, narrative reviews almost never consider this
threat, whereas meta-­analytic reviews routinely consider it and often take
steps to avoid it and/or evaluate it (indeed, there exists an entire book on

Questions That Can and Cannot Be Answered through Meta-­Analysis 29

this topic; see Rothstein, Sutton, & Borenstein, 2005b). Meta-­analysts typically make greater efforts to systematically search for unpublished literature
(and studies published in more obscure sources) than do those preparing
narrative reviews (Chapter 3). Meta-­analysts also have the ability to detect
publication bias through comparison of published and available unpublished
studies, funnel plots, or regression analyses, as well as the means to evaluate
the plausibility of the file drawer threat through failsafe numbers (see Chapter 11). All of these capabilities are absent in the narrative review.
Finally, the problem of “garbage in, garbage out”—that the inclusion
of poor quality studies in a review leads to poor quality results from the
review—is a threat to both narrative and meta-­analytic reviews. However, I
have described ways that you can overcome some problems of the primary
studies in meta-­analysis (low power, presence of methodological artifacts),
as well as systematically evaluate the presumed impact of study quality on
results, that are not options in a narrative review.
In sum, the problems that might threaten the results of a meta-­analytic
review are also threats to other types (e.g., narrative) of reviews, even though
they are less commonly considered in other contexts. Moreover, meta­analytic techniques have been developed that partially or fully address these
problems; parallel techniques for narrative reviews either do not exist or
are rarely considered. For these reasons, although you should be mindful of
these potential threats when performing a meta-­analytic review, these threats
are not limited—and are often less of threats—in a meta-­analytic relative to
other types of research reviews.

2.4 Practical Matters: The Reciprocal Relation
between Planning and Conducting
a Meta‑Analysis
My placement of this chapter on identifying research questions for meta­analysis before chapters on actually performing a meta-­analysis is meant to
correspond to the order you would follow in approaching this endeavor. As
with primary research, you want to know your goals and research questions,
as well as potential limitations and critiques, of your meta-­analysis before
you begin.
However, such an ordering is somewhat artificial in that it misses the
often reciprocal relation between planning and conducting a meta-­analytic
review. At a minimum, someone planning a meta-­analysis almost certainly



has read empirical studies in the area that would likely be included in the
review, and conclusions that the reader takes from these studies will undoubtedly influence the type of questions asked when planning the meta-­analysis.
Beyond this obvious example, I think that much of the process of conducting a meta-­analysis is less linear than is typically presented, but more
of an iterative, back-and-forth process among the various steps of planning,
searching the literature, coding studies, analyzing the data, and writing the
results. I do not view this reality as problematic; although we should avoid
the practice of “HARKing” (Hypothesizing After Results are Known; Kerr,
1998), we do learn a lot during the process of conducting the meta-­analysis
that can refine our initial questions. Next, I briefly describe how each of the
major steps of searching the literature, coding studies, analyzing the data,
and writing the results can provide reasons to revise our initial plans of the
As I discuss in detail in Chapter 3, an important step in meta-­analysis
is specifying inclusion/exclusion criteria (i.e., what type of studies will be
included in the literature) and searching for relevant literature. This process should be guided by the research questions you wish to answer, but
the process might also change your research questions. For example, finding
that there is little relevant literature to inform your meta-­analysis research
questions—­either too few studies to obtain a good estimate of the overall
effect size or too little variation over levels of moderators of interest—might
force you to broaden your questions to include more studies. Conversely,
finding that so many studies are relevant to your research question that it is
not practical to include all of them might cause you to narrow your research
question (e.g., to a more limited sample, type of measure, and/or type of
Research questions can also be modified after you begin coding studies
(see Chapters 4–7). Not only might your careful reading of the studies lead
you to new or modified research questions, but also the more formal process
of coding might necessitate changes in your research questions. If studies
do not provide sufficient information to compute effect sizes consistently,
and it is not possible to obtain this information from study authors, then it
may be necessary to abandon or modify your original research questions. If
your research questions involve comparing studies (i.e., moderator analyses),
you may have to alter this research question if the studies do not provide
adequate variability or coverage of certain characteristics. For example, if
you were interested in evaluating whether an effect size differs across ethnic
groups, but during the coding of studies found that most studies sampled
only a particular ethnic group, then you would not have adequate variability