Tải bản đầy đủ
Instrument 6. A: Failure to Pretest

Instrument 6. A: Failure to Pretest

Tải bản đầy đủ

146

Designing and Constructing Instruments for Social Research and Evaluation

of a double-barreled question, as it really asks two questions and respondents may
be uncertain which one they should evaluate. Because instructions were not developed for the instrument and staff were not provided training prior to conducting
the chart audits, it was subsequently determined that individual employees interpreted the items differently, resulting in a high degree of variation in their ratings,
that is, a lack of interrater reliability.
A number of problems were also identified regarding the implementation process. First, hospital management was notified of problems with the auditing process,
including the difficulty in interpreting items and the lack of consistency between
raters. Despite this feedback, the instrument was not revised, and it continued to
be used for nearly two years. Second, hospital management used aggregate data
produced by these checklists to compare the performance of treatment units in
the use of these interventions. This tended to create tension between management and treatment staff, particularly as there were questions about the accuracy
of the data produced by the instrument.
A number of steps could have been taken early in the process to improve the
quality and utility of this instrument and the data collection process. In retrospect
it is unclear if the instrument really suited management’s intended purpose, which
was to reduce the use of restrictive interventions. Writing a statement of purpose
(Chapter Five) might have clarified for management the appropriate methods
to attain that objective. Second, the instrument was not pretested, resulting in
INSTRUMENT 6.A: CHECKLIST FOR A MEDICAL
RECORD AUDIT.
Quality Assurance and Improvement Review Checklist
Unit _________________________ Date of Review _______ Clinical Record # ________
Reviewer ____________________________
1.
2.
3.
4.
5.
6.
7.
8.

c06.indd 146

Date of Event __________________

Are statements about the intervention clear and descriptive?
Is there an explanation of preventive alternatives attempted?
Is there documentation of pertinent antecedent events?
Does documentation support that the procedure was the
least restrictive?
Does the documentation meet criteria in hospital policy #101
or #102?
Does documentation include descriptive statements regarding
follow-up interventions used to instruct or counsel the patient?
Is there a treatment plan incorporating the use of this procedure?
Is the physician’s order specific to the event and is it time related?

YES
YES
YES
YES

NO
NO
NO
NO

YES

NO

YES

NO

YES
YES

NO
NO

7/9/07 11:49:07 PM

Pretesting

147

difficulty with administering it and with obtaining consistent results across raters.
As we have been emphasizing, an important step is to have the instrument
reviewed and to be receptive to feedback. However, even though provided this
information, the instrument designer did not revise the checklist. Fortunately, the
object of measurement was medical records and not individuals, so that decisions
made based on information produced by this instrument did not have a negative
impact on clients. Ultimately, management recognized the instrument’s shortcomings and alternative methods for addressing this topic were developed.

Key Concepts and Terms

c06.indd 147

construct validity

focus group

pretesting

criterion validity

interview

reliability

face validity

literature review

social desirability

field testing

pilot testing

7/9/07 11:49:07 PM

Y
CHAPTER SEVEN

THE STRUCTURE AND FORMAT OF
SELECTION ITEMS

In this chapter we will
• Explain how a selection item works.
• Describe the properties of common formats for constructing selection items,
including numerical, graphic, and Likert response scales.
Artists often make use of the rule of thirds to compose a painting. The artist draws
two vertical lines, dividing the canvas into three equally sized columns. Then he
or she draws two horizontal lines to divide the canvas into three equally sized
rows. The canvas is now divided into nine equally sized rectangles or squares.
The simplest way of applying the rule of thirds is to position the subject where
two of the lines intersect or partway between one of the intersections and the
center. This produces a more pleasing composition than having the subject
exactly in the center.
Much of what we do in instrument construction is based on composition:
how we compose individual items and how we arrange multiple items to create the instrument. The term format refers to how an item or part of an item is
organized and presented. Formatting may influence how a respondent or rater
completes an item. Formatting is also an important component in the construction of some multi-item scales where the order of the response set influences the
choices the respondent makes and how the item is scored.
148

c07.indd 148

7/9/07 11:49:34 PM

The Structure and Format of Selection Items

149

When we create a selection item we determine in advance how the item will
be organized and what alternatives the respondent or rater will have to pick from.
For example, if you ask someone for directions there are a number of ways he or
she can structure the response. He or she may use landmarks (“Turn right after
the first stoplight and then left at the 7-Eleven”) or the names of roads or highways (“Take a left onto Elm and then two blocks to the I-95 turnoff ”). The same
holds true with rating items. Although you may have a basic question you want
to ask, the manner in which you format it can affect how it is completed. The
format you chose will be based on a number of factors, including prior experience
(yours and the respondent’s), how well the format fits your information needs, and
feedback from pretesting. You might try more than one format as you pretest the
instrument and find that respondents have more difficulty with one than another.
In this chapter we will describe a number of rating item formats. We will begin
with a discussion of response sets, or scales, and then move into the different ways
that you can organize and present items with scales.

Response Alternatives, or Scales
An instrument composed of items that ask users to rate their response is, in our
extensive observation, one of the most common approaches to constructing a
questionnaire such as a marketing survey or political poll. Rating scales are also
the predominant format in use in behavior rating instruments, inventories, and
checklists. A scale is composed of a series of values placed along a continuum,
such as the tonal values that create a musical scale. For social science instruments
the values are typically response alternatives, and we use the term rating scale
when we ask users to select an alternative along the continuum, with an instruction such as, “Rate the following on a scale of 1 to 5.” A typical rating item is
made up of two parts, a stem and a response set, or scale (you may also use the
terms choices or alternatives for the response set). The stem serves as the stimulus
for, or elicits, the response and may be written as a word, phrase, sentence, or
paragraph. The response set is a series of categories (which may or may not
be numerical), from which the respondent selects one. As we noted in Chapter
Three, the response scales for rating items that measure attitudes, opinions, and
beliefs provide ordinal level data.
When you use a rating scale, you are making these assumptions:
• The attribute can be scaled along some continuum. For example, an item asking respondents to rate the effectiveness of a training workshop assumes that
effectiveness can be scaled (for example, from ineffective to highly effective).

c07.indd 149

7/9/07 11:49:35 PM

150

Designing and Constructing Instruments for Social Research and Evaluation

• The continuum is isomorphic to an internal continuum of the respondent’s. In
other words, the continuum is intrinsic to that individual. If the trait to be rated is
greenness and if the rating is to be made on a scale between blue and yellow, then
a person who is color-blind will lack a continuum on which to rate the trait.
• The respondent can consistently discriminate between continuum levels. You
need sufficient gradation to capture differences along the scale’s continuum. At
the same time the response set values should be sufficiently exclusive that the
user can easily distinguish between them. For example, even though the intervals may not be mathematically proportional, the respondent can differentiate
between a 2 and a 3 on a scale of 1 to 5.
Rating scales are formatted in a number of ways. For example, the response
set may be composed of word indicators (such as never, seldom, occasionally, and
always), it may be represented solely by numbers or by graphics (pictures, symbols, or a line that indicates change in intensity), or it may be presented along
a continuum that is anchored by a word or phrase at each end; we will present
examples of each of these approaches in this chapter.
In some cases it may be necessary to include a definition or descriptive statement
to clarify the meaning of a response alternative. Consider an item written to measure a behavior, such as a child’s temper tantrums, where the response scale is 0 ϭ not
present, 1 ϭ mild, 2 ϭ moderate, 3 ϭ severe. The severity of the temper tantrum could be a
measure of intensity (loud, out of control, and not responding to verbal interventions), frequency
(once a day, twice a day, three times a day, or four or more times a day), or duration (less than
hour, an hour, more than an hour). Consequently, a descriptive statement may need to
accompany each response alternative to instruct the rater. In the following example
the descriptors indicate that severity is being measured by behavior intensity.
EXAMPLE
Frustration

c07.indd 150

0

1

2

3

Not present.
Behavior was
not observed.

Mild. Gives up
easily on tasks
when the task
appears to be
demanding. May
express frustration
by cursing,
stomping feet,
or leaving
activity.

Moderate. May initially
refuse and therefore
require encouragement to engage in a
demanding activity.
May externalize
frustration through
attempts to escape
the situation or
through acting
out or aggression
toward objects.

Severe. Is very
avoidant of
demanding
activities. May
externalize
frustration through
attempts to escape
the situation or
through acting out or
aggression toward
objects or others.

7/9/07 11:49:35 PM

The Structure and Format of Selection Items

151

A scale must be appropriate to the item’s purpose. For example, using a
scale running from strongly disagree to strongly agree would obviously be
inappropriate for the item in the previous example. When the scale associated
with an item is not appropriate, then it will likely produce results that are not
valid.
You can present the response scale to the user of your instrument in a
number of ways. You can orient it horizontally or vertically (as a list). You
can associate numbers with the alternatives and have the respondent circle
the chosen number. You can use a box ❑ and have the respondent place a
check in the box. Instruments mounted on Web sites also make use
of buttons; when a choice is made, a dot appears in the center of the selected
button.

Factors that influence presentation include the amount of space available to
work in, such as the size of the paper or the amount of computer screen that is
visible, the size and style of the fonts and symbols you can use, and feedback during pretesting. For example, respondents might say that an item is easier to follow
and mark when the response alternatives are listed across the page rather than
down the page.
Fink (1995) identifies five response set categories: endorsement (also called agreement), frequency, intensity, influence, and comparison. A response scale must be associated
with its item so that the alternatives are appropriate for that item. If the stem asks
the respondent to form an opinion—if it says, for example, “The local school
system is doing a good job of educating children”—an appropriate and associated
response set might be an endorsement scale of strongly agree to strongly disagree.
A response set based on intensity, such as mild to severe, would not be appropriate to
that item. The correct match between stem and response scale is typically evident;
however, you may not be able to determine the best fit until you have pretested
the item. For example, if you were to use a frequency scale (such as never, sometimes, frequently, or always) with that school system opinion item, it might appear to
match initially, but individuals pretesting the instrument might tell you it is awkward to use and difficult to apply. So that you do not have to reinvent the wheel,
Exhibit 7.1 contains lists of commonly used response scales culled from a variety of
instruments.

c07.indd 151

7/9/07 11:49:35 PM

152

Designing and Constructing Instruments for Social Research and Evaluation

EXHIBIT 7.1: RESPONSE SET ALTERNATIVES
FOR RATING SCALES.
Fink (1995) suggests that there are five types of response sets: endorsement (also
called agreement), frequency, intensity, influence, and comparison. The following lists
contain frequently used response sets, organized by these categories.

ENDORSEMENT
Definitely true
True
Unsure
False
Definitely false
Strongly disagree
Disagree
Not sure
Mostly agree
Strongly agree
On a scale of 1 to 9 where:
1 = Don’t agree [to]
9 = Totally agree
Always no
Mostly no
Mostly yes
Always yes
Very dissatisfied
Somewhat dissatisfied
Both satisfied & dissatisfied
Somewhat satisfied
Very satisfied
Harmful
No help
Moderate help
Very helpful
Uncertain

c07.indd 152

Very unimportant
Somewhat unimportant
Somewhat important
Very important
More than adequate
Generally adequate
Of marginal adequacy
Inadequate
Very inadequate
Extremely unwelcome
Somewhat unwelcome
Somewhat welcome
Extremely welcome
Minimal commitment
Modest commitment
Significant commitment
Heavy commitment
Definitely yes
Probably yes
Uncertain
Probably no
Definitely no
Trust them a lot
Trust them some
Trust them only a little
Trust them not at all

7/9/07 11:49:35 PM

The Structure and Format of Selection Items

153

Very difficult
Difficult
Unsure
Easy
Very easy

Yes
No

One of the worst
Less than average
Average
More than average
One of the best

True
False

Agree
Disagree

Good
Not good

To little or no extent
To some extent
To a moderate extent
To a great extent
To a very great extent

FREQUENCY
Never
Rarely
Sometimes
Frequently
Always
Very little
Some
Quite a bit
Very much
Most of the time
Some of the time
Hardly ever
Never
Typical
Rare
Absent
Every year
Every few years
Almost never
Never

c07.indd 153

Never
Once a year
Every few months
Every few weeks
Once a week
Several times a week
Daily
One a scale of 1 to 9 where:
1 = Almost never [to]
9 = Almost always
On a scale of 1 to 5 where:
1 = Never happens [to]
5 = Happens often (or happens a great
deal)
On a scale of 1 to 9 where:
1 = Not at all [to]
9 = A great deal
On a scale of 1 to 5 where:
1 = Never utilized [to]
5 = Utilized very often

7/9/07 11:49:36 PM

154

Designing and Constructing Instruments for Social Research and Evaluation

Not present
Slight or occasional
Marked or repeated
Uncertain

A great deal
Somewhat
Little
Not at all

Far too much
Too much
About right
Too little
Far too little

Always
Never
Highest possible
Lowest possible

INTENSITY (SEVERITY)
On a scale of 1 to 4 where:
1 = Functioning well
2 = Mild impairment
3 = Moderate impairment
4 = Severe impairment
None
Very mild
Mild
Moderate
Severe
Very poor
Poor
Adequate
Good
Optimal
Very relaxed
Relaxed
Neither relaxed nor tense
Anxious
Very anxious
Maximum risk
Moderate risk
Minimum risk
No risk

c07.indd 154

None
Very low
Low
Moderate
High
Very high
On a scale of 1 to 5 where
1 = No difficulty [to]
5 = Extreme difficulty
Very active
Active
Somewhat active
Not active
Poor
Fair
Good
Excellent
High
Low
Painful
Painless

7/9/07 11:49:36 PM

The Structure and Format of Selection Items

155

INFLUENCE
Not a problem
Small problem
Moderate problem
Big/large problem

Fair
Unfair

COMPARISON
Much more than others
Somewhat more than others
About the same as others
Somewhat less than others
Much less than others

Numerical Scales
A numerical scale presents the respondent with a stem, and he or she responds
by selecting an answer from alternatives ordered along a continuum. These
response alternatives may be written descriptions or may be indicated by a
letter or number. They are referred to as numerical scales because the respondent
chooses from a “number of categories” (Judd, Smith, & Kidder, 1991). Numerals
may or may not be associated with the response alternatives, and when they are
used, they are placeholders; the response scale produces data of either a nominal
or an ordinal nature. The choice of one numerical format over another depends
largely on the preferences of the instrument designer, who has knowledge about
the potential respondents and about the total organization of the instrument.
Following are examples of an item stem formatted with a variety of numerical
scales:
EXAMPLES
Assume that you have been instructed to develop an instrument to determine teachers’ perceptions of in-service training. One of the variables of interest is the teachers’
opinion of text readability.
A. Rate the readability of the text compared with other textbooks you have read.
(Check one)

Very difficult
to read

c07.indd 155


Difficult to
read


About average
to read


Easy to
read


Very easy
to read

7/9/07 11:49:36 PM

156

Designing and Constructing Instruments for Social Research and Evaluation

B. Rate the readability of the text compared with other textbooks you have read.
(Check one)

1


2


3


4


5

C. Rate the readability of the text compared with other textbooks you have read.
(Check one)

Very difficult
to read




About average
to read




Very easy
to read

D. Rate the readability of the text compared with other textbooks you have read.
(Check one)
——— a. Very difficult to read
——— b. Difficult to read
——— c. About average to read
——— d. Easy to read
——— e. Very easy to read
E. Rate the readability of the text compared with other textbooks you have read.
(Check one)

[]

[]

[]

[]

Very difficult
to read

[]
Very easy
to read

F. Rate the readability of the text compared with other textbooks you have read.
(Check one)
———
Very difficult
to read

———

———

———

———
Very easy
to read

Note that each of these response sets is essentially the same: a continuum
with five reference points. The points at the ends of the continuum are referred
to as anchors. These response sets read from left to right, with very difficult to read at
the beginning and very easy to read at the end. In example A a descriptor is provided
for each response, whereas in example B only numbers are used. By convention,
response scales are ordered from low to high. However, in example B it is possible that a respondent could associate the larger number with harder and the
lower number with easier. In this case descriptors or anchors should be used
to ensure that the respondent understands in which direction the items should
be rated.

c07.indd 156

7/9/07 11:49:36 PM