Tải bản đầy đủ
Instrument 3. A: Data Extraction Form

Instrument 3. A: Data Extraction Form

Tải bản đầy đủ

Measurement

61

INSTRUMENT 3.A: DATA EXTRACTION FORM.

Adolescents with Co-Occurring Disorders
Data Extraction Form
Facility: ᭺ A ᭺ B

Study ID No: (001-999):_____

Chart Reviewer I.D. No:____

DEMOGRAPHICS
1. Age at Admission (in years):_____ 2. First Language: ᭺ English ᭺ Other_____
3. Sex: ᭺ Male

᭺ Female

4. Primary Referral Source:
᭺ JDR/Circuit Court
᭺ Private Physician
᭺ CSB
᭺ School
᭺ Social Services
᭺ Hospital
5. Please circle number of previous acute-care psychiatric admissions:
0

1

2

3

4

5

6

7

8

9 or more

FAMILY STRUCTURE
6. Biological Parents Marital Status:
᭺ Married ᭺ Divorced ᭺ Separated ᭺ Widowed
᭺ Never Married/not living together
᭺ Never Married/living together
᭺ Deceased
᭺ Not Documented
SUBSTANCE USE
7. Tobacco: ᭺ Yes

᭺ No

᭺ Not Documented (if yes, complete table below)

Level of Reported Use:
᭺ Never ᭺ Once or twice ᭺ Less than once a month ᭺ At least once a month
᭺ Once a week ᭺ Daily

Question

Not
Documented

Youngest age client reportedly used tobacco
Highest number of days used in any 30 day period
Highest quantity used during any single
episode of use

c03.indd 61

7/9/07 11:47:29 PM

62

Designing and Constructing Instruments for Social Research and Evaluation

8. Has client received substance abuse treatment at any time prior to this
admission?
᭺ Yes ᭺ No Not Documented (If yes, check all that apply in question 8A,
if no skip 8A)
8A. ᭺ Outpatient
᭺ Residential Ͻ3 months
᭺ Juvenile drug court ᭺ Other (specify)

᭺ Residential Ͼ3 months
᭺ Not documented

DSM IV-TR DIAGNOSIS
9. Axis I substance related disorders identified at admission (include rule outs,
provisional, and by history diagnoses):

Substance Use Related Disorder

DSM IV-TR Code

10. Axis IV: Psychosocial and Environmental Stressors

Stressor
(Problem With):

Mild

Moderate

Severe

Severity Not
Specified

Not specified, only
severity noted
Primary support group
Social environment
Educational
Housing
Access to health care
Legal system/crime
Other

c03.indd 62

7/9/07 11:47:30 PM

Measurement

63

Endnotes
1. Another aspect of measurement is that indicators may be culturally bound. For example,
we use inches and feet in the United States whereas much of the rest of the world uses
centimeters and meters.
2. Although Sarle (1995) observes that binary variables “are at least at the interval level. If
the variable connotes presence or absence or if there is some distinguishing feature of one
category, a binary variable may be at the ratio or absolute level” (p. 5). This is a reminder
that social scientists do not always agree on even the most fundamental concepts.
3. Values, or variables, may also be classified as discrete or continuous. Discrete variables can be
placed only into categories (for example, eye color, race, or political affiliation). Continuous
variables can assume any value along a continuum (for example, grade in school, height,
or income level).

Key Concepts and Terms
interval level
level of measurement

c03.indd 63

measurement
nominal level

ordinal level
ratio level

7/9/07 11:47:30 PM

Y
CHAPTER FOUR

INSTRUMENT CONSTRUCTION, VALIDITY,
AND RELIABILITY

In this chapter we will
• Describe the concepts of validity and reliability.
• Identify a number of ways to demonstrate that an instrument is producing
valid and reliable information.
• Explore the implications of misusing or misrepresenting data derived from
an instrument.
Artist David Hockney has an interesting theory. He believes that Renaissance painters were able to create photo-realistic paintings by using mirrors and
optical devices to project an image onto a canvas. He suggests that the strong
lighting and contrast in some Renaissance paintings reflects the use of devices
that required a lot of illumination. This is a controversial idea because scholars
are not sure the appropriate technology existed at that time. However, if correct,
it could explain why as early as the fifteenth century, artists were able to create
such accurate and vivid portraits (Hockney, 2001).
Realistic painting attempts to capture an image as accurately and faithfully
as possible. As an instrument designer you face a similar challenge—because
instruments are used to measure both objective and subjective phenomena, it
is important that they provide information that is trustworthy and credible. If
an instrument fails to do this, then everything that comes afterward, including
64

c04.indd 64

7/9/07 11:47:59 PM

Instrument Construction, Validity, and Reliability

65

the data analysis and the findings presentation will be suspect. As the computer
industry adage says, “Garbage in, garbage out.”
In this chapter we will examine two concepts applied to ensure an instrument provides credible and accurate information. Validity refers to the ability of
an instrument to measure what you intend it to measure, and reliability speaks to
the consistency of your measurement. These concepts are closely related. Suppose that you use this item stem on a job satisfaction survey, “I get a feeling of
personal satisfaction from my work,” and the respondent is expected to rate the
item on a scale of strongly disagree to strongly agree. If the item is valid, it will be a
good, or strong, measure of job satisfaction. If the item is reliable, a respondent
will provide a consistent response across time and settings—for example, rating
the item the same way on two different occasions.

Validity
Suppose you have been asked to complete a questionnaire that attempts to measure a personal attribute, such as your leadership style. Now imagine an item
that asks about your personal health. You would probably scratch your head and
wonder what that has to do with leadership style. You might be offended or wonder about the instrument design. Similar though perhaps less obvious problems
arise anytime we craft items intended to obtain information about one thing and
instead measure something else. For example, a challenge for the developers of
psychometric instruments is differentiating and measuring affective states such as
depression and anxiety, which may manifest with similar behaviors.
Validity1 describes the extent to which we measure what we purport to measure. An instrument is or is not intrinsically valid, as validity is a characteristic of
the responses. Consequently, it is important to pretest the instrument to obtain
preliminary data that can be used to assess validity.
Validity exists along a continuum. The greater the evidence that an instrument is producing valid results, the greater the likelihood that we will obtain the
information we need: “Hence, validity is a matter of degree. It is not a simple
either-or, all-or-none question of valid or invalid. It is an attribute that exists along
a continuum, from high to low, in varying degrees. It is inferred, or judged from
existing evidence, not measured or calculated directly” (Worthen, Borg, & White,
1993, p. 180).
There are several ways to conceptualize and categorize validity. As we will
see, assessing validity is often a matter of judgment. It is important to keep in
mind that although we can distinguish types of validity, they are all means of
answering the same question: Are we measuring what we purport to measure?

c04.indd 65

7/9/07 11:47:59 PM

66

Designing and Constructing Instruments for Social Research and Evaluation

Face Validity
Face validity is the degree to which an instrument appears to be an appropriate
measure for obtaining the desired information, particularly from the perspective of
a potential respondent. Suppose you have designed an instrument to measure health
behavior, and one of the items asks respondents to indicate if they are smokers
and, if so, about how many cigarettes a day they smoke. On its “face,” this item
appears to relate directly to health behavior and therefore would produce a valid
response. Although face validity is often criticized as a less rigorous approach
than others to assessing validity, it can provide useful information about the entire
instrument and the degree to which it is meeting its intended purpose.

Construct Validity
Many concepts we are interested in, such as safety, intelligence, creativity, or patriotism, are not directly observable or measurable. Social scientists refer to these abstractions as constructs, and a major concern is the degree to which an instrument actually
measures them. One of the challenges associated with construct validity is ensuring
that instrument designers and respondents have a shared definition of the construct. Leadership, for example, may have more than one definition, such as an
ability to motivate people, to direct people, or to facilitate change in people. If you
want to develop an instrument to measure leadership, you will have to be specific
in defining the attributes of leadership you are examining. At times our entire
understanding of a construct can change. In the 1960s and 1970s, psychiatrist
Thomas Szasz argued that mental illness was not so much a physical disease as a
sociological process (see, for example, Szasz, 1973). For those agreeing with Szasz,
it meant an entirely different way was needed to measure and understand mental
illness. (Contemporary models suggest that mental illness is influenced by both
physical and environmental processes.) When the meaning and our understanding of a construct can change over time, an instrument designed to measure the
construct at one point in time may not provide valid measures at another time.
Because we cannot observe a construct directly, we have to find tangible ways to
measure it, a process referred to as operationalization. We might operationalize appendicitis, for example, through observed or reported symptoms, such as dull pain in
the lower-right abdomen, loss of appetite, nausea, and fever. Because other disorders, such as irritable bowel syndrome, share some of these symptoms, the more factors, or variables, we can associate with the concept of appendicitis the more we will
differentiate it and the more valid our measurement (and diagnosis) will be.
Of course appendicitis is not only a concept but also an actual disorder, and
once the appendix has been removed, there is physical evidence of the problem.

c04.indd 66

7/9/07 11:48:00 PM

Instrument Construction, Validity, and Reliability

67

Operationalizing constructs in the social sciences may not be as straightforward. For
example, some of the characteristics associated with depression are loss of energy,
feelings of worthlessness, difficulty concentrating, and weight loss. Whereas
changes in weight can be directly measured, worthlessness is itself a construct,
and so we are faced with the challenge of demonstrating evidence for the validity
of one construct by using another construct.2

The “T” Test
Ask a group of participants to write the letter “T” on a sheet of paper as many
times as they can in one minute. Next, ask them to count the number of T’s and
plot the distribution on a chart. Then ask them what the T-Test measures.
Most groups can generate at least ten separate concepts that this activity may
be attempting to measure, such as eye-hand coordination, dexterity, creativity,
competitiveness, anxiety, ability to follow directions, quickness, compulsiveness,
achievement need, and T-making behavior.
This activity demonstrates the difficulty of associating an overt behavior with
an underlying construct. As an instrument designer, how would you demonstrate
the construct validity of this test?
Source: Adapted from Reilly, 1973.

A threat to validity and hence to the data we obtain from an instrument occurs
when respondents or raters interpret an item in their own way and respond to
that interpretation. To study the impact of misunderstanding or misinterpreting
items, Philip Gendall (1994) used an existing questionnaire but added questions
about the respondent’s understanding of certain items. Respondents were asked
if they agreed or disagreed with a statement about people with AIDS and a statement about compulsory military training for young, unemployed people. After
each one they were asked, “In your own words, exactly what did you think that
statement meant and how did you arrive at your answer?” (p. 2). In regard to the
AIDS question, Gendall found that respondents not only had different interpretations of the meaning of the statement but had also created personal definitions of
who was meant by people who have AIDS. In other words, both the respondent’s
interpretation of the question as a whole and his or her understanding of the
constituent parts influenced that person’s ability to provide a valid response:
For the majority of the 160 respondents who did not (or could not) explain what
the AIDS statement meant, the reason was that they were more concerned
with justifying their answer than explaining it. Many of these respondents had
disagreed with the statement earlier in the interview and in doing so had taken

c04.indd 67

7/9/07 11:48:00 PM

68

Designing and Constructing Instruments for Social Research and Evaluation

the opportunity to express their strong disapproval of homosexuals, drug users
and sexually promiscuous people. In other words, these respondents had deliberately reinterpreted the question to allow them to express their own opinion,
and for them the “correct” meaning of the question was irrelevant. This conclusion was confirmed when these respondents were asked who they thought
was meant by people who had AIDS. Many of them admitted that they had
confined their judgment to those they disapproved of [Gendall, 1994, p. 5].

Construct validity also involves convergent validity and discriminant validity.
Convergent validity refers to the relationship between measures of constructs
that should be strongly related, such as depression and feelings of worthlessness.
Discriminant validity refers to the relationship between the measures of constructs
that should not be strongly related to each other, such as depression and feelings
of happiness. To demonstrate the construct validity of your instrument you will
want to demonstrate both convergent and discriminant validity. Demonstrating
only one or the other tells just half the story.
Content Validity
Content validity is the degree to which an instrument is representative of the topic and
process being investigated. Suppose you are investigating at-risk behaviors among
teenagers and plan to administer a survey questionnaire. To demonstrate content
validity the instrument should address the full range of at-risk behaviors, those
typically identified by experts and discussed in research literature. If it asks about
alcohol and drug use and sexual behavior but not about illegal behavior such as
shoplifting, then it may not be addressing the full domain of at-risk behaviors.
Note that in assessing content validity you are attempting to identify as many
factors as possible that operationalize the construct. In some cases this number
may be somewhat finite. In others it may be difficult to identify all the factors
related to the construct, or you may have so many factors it is not possible to
include all of them in the instrument. In the latter case, you may want content
experts to rate the importance of these factors to help you determine which are
most relevant to the focus of your study.
Criterion Validity
Criterion validity involves making a comparison between a measure and an external
standard. Suppose you want to measure and predict how well someone recovering
from a stroke can function independently or the level of assistance required. The
first step would be to operationalize the concept of independent functioning by

c04.indd 68

7/9/07 11:48:00 PM

Instrument Construction, Validity, and Reliability

69

identifying activities of daily living, such as tying one’s shoes, getting dressed,
brushing one’s teeth, and bed making, which would serve as the assessment criteria. Items would then be constructed to attempt to measure the individual’s
ability to meet these criteria. One way of assessing criterion validity would be to
compare individuals’ scores on the instrument to their actual performances. Criterion validity could also be determined by comparing the results obtained from
your instrument to results from another instrument that attempts to measure the
same construct using the same criteria. If there is a strong relationship, then you
can say that your instrument displays criterion validity.
Evidence of criterion validity should be obtained for any instruments that
measure performance, such as behavior rating scales and psychometric instruments. This is particularly true if you want an instrument to predict future behavior, such as how disabled individuals will perform as a result of receiving physical
therapy.
Predictive Validity
Predictive validity exists when you can use an instrument or measure to predict the
results of one variable from another variable. The classic example is the correlation
between SAT scores and grade point average (GPA). Because there is a correlation, we can predict that students with a high GPA will also score highly on
the SAT. If there is a strong relationship or correlation between the scores on your
instrument and another instrument intended to measure the same criterion,
your instrument would be said to evidence concurrent validity.
Multicultural Validity
Some social scientists also contend that instrument developers should consider
multicultural validity (for example, Kirkhart, 1995), meaning that an instrument
measures what it purports to measure as understood by an audience of a particular
culture. For example, a multiculturally valid instrument will use language appropriate to its intended audience. This might require translation into a foreign language
or checking that phrases and connotations will not be misunderstood by respondents. Whether we refer to such considerations as multicultural validity or as being
sensitive to the needs of the instrument’s audience, they reflect good practice.
Demonstrating instrument validity is important to you as the developer
because this information can be used to refine and improve the instrument. If
feedback from pretesting suggests the instrument is not producing valid results,
then items should be reworded or deleted. It is also important to describe your
pretesting in supporting documentation, so that potential users can answer their
questions about the instrument’s validity, For example, Grisso and Underwood

c04.indd 69

7/9/07 11:48:00 PM

70

Designing and Constructing Instruments for Social Research and Evaluation

(2004, p. 12) suggest that users consider the following criteria when selecting an
instrument:
• An instrument should not be selected if no research exists on the degree of its
reliability or validity when administered to its intended target audience.
• Instruments that provide evidence of reliability and validity with the intended
target audience are preferable to those that do not.
• The greater the consequences and importance of the decisions to be made, the
higher the standard that should be applied in judging whether an instrument
has an acceptable degree of reliability and validity.

Additional Help for Selecting an Instrument
For over half a century the Buros Institute of Mental Measurements has promoted
“meaningful and appropriate test [that is, instrument] selection, utilization, and
practice. The Buros Institute encourages improved test development and measurement research through thoughtful, critical analysis of measurement instruments
and the promotion of an open dialogue regarding contemporary measurement
issues” (Buros Institute of Mental Measurements, 2007). One of Buros’s main
functions is to provide potential users with information for deciding whether an
instrument is appropriate for their population and setting or can be used for comparison, as in assessing criterion validity.
The Buros Institute publishes the Mental Measurements Yearbook and Tests in
Print, which provide systematic instrument review and evaluation and are available
in print, from the Buros Test Reviews on Line database, and from many college and
university libraries. Reviews include instrument descriptions (including purpose,
audience, and content details), publication information, validity and reliability evidence, and strengths and weaknesses assessments, Buros reviews provide examples
of what might be considered when pretesting an instrument and a format for
presenting evidence of validity and reliability.
You may view a sample review, including a discussion of the approaches used
to demonstrate validity and reliability, at http://www.unl.edu/buros/bimm/html/
reviewsample.html.

Demonstrating Evidence for Validity
Artists frequently sketch their subjects from many different angles prior to committing to a pose. Pretesting is a similar process that allows you to check the
validity, reliability, and utility of an instrument prior to administering it to your

c04.indd 70

7/9/07 11:48:01 PM

Instrument Construction, Validity, and Reliability

71

target audience. Chapter Six examines the actual process of pretesting in detail;
this section focuses on ways that pretesting can provide information for revising
and improving your instrument and hence its validity.
There are essentially two types of approaches for assessing instrument validity:
qualitative and quantitative. Qualitative approaches are evaluative. One of the most
effective is to review research literature about the topic of interest. This process will
help you define the topic (its themes and content) and can provide evidence that
the instrument is measuring these constructs and not something else.
Another approach is to have topic experts review the instrument, using their
judgment to identify ways to define and operationalize the construct and indicating whether they believe each item appears to measure what it is intended to
measure. For example, researchers were interested in measuring attitudes among
psychiatric hospital staff who were subject to the aggressive behaviors of patients.
An early version of the instrument did not clearly define the term aggression, so
respondents were not sure how to interpret items; did aggression refer to verbal
threats, nonverbal gestures, a slight push or shove, or assaultive behaviors resulting in injury? In response to feedback from potential users and content experts,
designers added the study’s definition of aggression to the survey introduction.
During a review process, reviewers may also find poorly worded items likely to
compromise instrument reliability or prevent valid results.
A third qualitative approach is to develop a table of specifications, a means
to identify the topic variables, or factors. This can be accomplished deductively
or inductively. A deductive approach works from the general to the specific. You
begin by stating the construct to be examined and then identify the ways it can be
operationalized. This in turn suggests specific items. Recording this information in
a table, or matrix, gives you a graphic view of the links between topic and items.
For example, if your instrument is assessing depression, your table of specifications
would include behaviors associated with this affective state, such as withdrawal,
feelings of worthlessness, fatigue, and weight loss. This in turn would suggest such
specific items as, Have you experienced a change of weight in the past ninety
days? (For an example of a table of specifications see Table 5.3 in Chapter Five.)
An inductive approach works from the specific, such as finite items, to broader
generalizations, such as the underlying construct. You might create a list of the
variables associated with the construct and then ask a content expert to match
the items you have created to the appropriate variable. The stronger the match, the
more likely the item is to be a valid measure.
Quantitative approaches are typically based on measuring the strength of the
association between your instrument and another measure of the same construct.
Convergent and discriminant validity are ascertained by comparing the results
from your instrument to results from existing instruments. For example, when

c04.indd 71

7/9/07 11:48:01 PM