Tải bản đầy đủ
6…Questionnaire Pre-testing, Revision and Final Draft
these basic factors. The next step is to determine the questionnaire content, so that
it deals with identifying the need for data, the question’s ability to yield data, the
participant’s ability to answer without generalizations and estimates and willingness to answer sensitive questions.
Knowing how each question should be phrased requires familiarity with the
different types of questions. This leads to the next step of the questionnaire
designing, that is questionnaire response format. This deals with issues of using
open-ended or close-ended questions. Open-ended questions require the respondent to do most of the talking while close-ended questions restrict the respondent’s
responses to the available options. Each has its own advantages and disadvantages
and is suited to different interviewing techniques.
Experiences from previous researches have helped establish general guidelines
regarding questionnaire wording and sequence. It should be ensured that questions
resort to shared vocabulary and adequate alternatives for better understanding and
response rates. The questions should be free of implicit assumptions, biased and
loaded words. It should also be free of questions that are double-barrelled and that
would provoke the respondent to provide generalizations and estimated answers.
Questionnaire sequencing is very important to elicit required information from the
participant. The opening questions should arouse the respondent’s interest in the
survey. The specific and general questions should be followed in order. This means
that the questions are sequenced in the following manner: lead-in, qualifying,
warm-up, specific and demographic.
Last, the questionnaire should be pre-tested before administration for detecting
flaws and revised with necessary corrections and deletions. This would lead to the
final draft to be used in the actual survey.
Labaw, Philip Gendall (1980) A framework for questionnaire design: Labaw Revisited, Mark
Bull 1998, 9, 29–39
Multivariate Data Analysis Using IBM
Data Preparation and Preliminary
After data collection is over and all completed questionnaires are in hand, a
researcher has to analyse the data collected through the research. Data analysis
plays an important role in transforming a lot of data into verifiable sets of conclusions and reports. Proper analysis helps the researcher to gain insights from the
data and to arrive at informed judgments and conclusions. However, if the purpose
of research is not defined properly or if research questions are irrelevant, even the
best analytical techniques cannot produce good results. Data analysis may give
faulty results even when research is done properly. This is because of the application of inappropriate methods to analyse data.
In this chapter, we will start with preliminary data preparation techniques like
validation, editing and coding, followed by data entry and data cleaning. The
penultimate section discusses various types of survey tabulations. This is followed
by the final section of the chapter, which provides insight into data mining and its
applications. The various steps in data preparation and preliminary data analysis
are shown in Fig. 6.1.
6.1 Validating and Editing
Validation is the preliminary step in data preparation. It refers to the process of
ascertaining whether the interviews conducted complied with specified norms. The
essence of this process lies in detecting any fraud or failure by the interviewer to
follow specified instructions (Beegle 1981). In many questionnaires, we find there
is a separate place to record the respondent’s name, address and telephone number
and other demographic details. Though no apparent analysis can be done on such
data, it is the basis for what is called ‘validation’. Validation helps to confirm if the
interview was really conducted.
Editing is the process of checking for mistakes by the interviewer or respondent
in filling the questionnaire. Editing is usually done twice before the data are
submitted for data entry.
S. Sreejesh et al., Business Research Methods,
Ó Springer International Publishing Switzerland 2014
Fig. 6.1 Stages of data
6 Data Preparation and Preliminary Analysis
The first editing is done by the service firm, which conducted the interviews.
The second editing is done by the market research firm that outsourced the
interviews. Editing is a manual process that checks for problems cited below:
Finding out whether the interviewer followed the ‘skip pattern’. The questionnaire is designed so that depending on the respondent’s response, the interviewer skips to the next relevant question. This is called the ‘skip pattern’.
Sometimes it might happen that interviewers skip questions when they should not
and vice versa. In the sample questionnaire shown in Fig. 12.2, the interviewer
should skip to question 7 if the response to the first question is either (A) or (E)
Responses to open-ended questions are vital for business researchers and their
clients. Eliciting the correct responses to open-ended questions shows the interviewer’s competence. Hence, interviewers are instructed to probe initial responses
and are asked not to distort the actual wordings or interpret the response of an
For instance, different possible responses for the second question are shown
Question: Why do you eat chocolates?
Respondent 1: Because I like chocolates
Respondent 2: I like chocolates. I like their taste and softness. (Indicates interviewer probed further into the response.)
Respondent 3: I like chocolates because they give me energy
In the first response, the interviewer failed to extract the correct response. The
objective of probing is to extract the reason behind eating chocolates. In the second
6.1 Validating and Editing
Date: -------Respondent’s telephone number -----------Respondent’s address --------------Respondent’s age:
1. How many chocolates do you eat in a typical week?
A. Less than 5
B. Between 5 and 10
C. Between 11 and 20
D. More than 20
E. Don’t know
(INTERVIEWER- IF RESPONSE IS “A”, “E” OR “F”, GO TO
2. Why do you eat chocolates?
Respondent’s answer ------------------------------------------------------3. Which brand of chocolates do you prefer most?
e) Others (specify) -----------------------4. When do you like to eat chocolates?
Response ------------------------------5. Do you prefer chocolates to sweets? (Y/N)
6. Do you have any negative associations with chocolates?
-------------------------------------------7. What is your age group?
a) Under 10
b) Between 10 and 20
c) Between 21 and 30
d) Above 30
e) Refused to answer, no answer or don’t know
May I know your name? My office calls about 10% of the people I
visit to verify if I have conducted the interviews.
Gave name ------------Refused to give name: --------Thank you for your time. Have a good day.
Fig. 6.2 Sample questionnaire consumer survey on chocolate consumption pattern
6 Data Preparation and Preliminary Analysis
response, the interviewer might have asked further questions like, ‘Do you like
ABC brand of chocolates?’ On a positive reply, the interviewer might have probed
more by asking, ‘What do you like about it?’ This is the correct way to elicit
responses for open-ended questions. The interviewer can even go further and probe
how a specific product characteristic is attached to the individual’s sub-conscious.
Though editing is time-consuming, it has to be done with care and patience
because it is important to data processing.
6.1.1 Treatment of Unsatisfactory Responses
During editing, the researcher may find some illegible, incomplete, inconsistent or
ambiguous responses. These are called unsatisfactory responses. The responses are
commonly handled by assigning missing values, returning to the field or discarding
Assigning missing values—Though revisiting the respondent is logical, it is not
always possible to re-visit the field every time the researcher gets an unsatisfactory
response from participants. In this situation, a researcher may resort to assigning
missing values to unsatisfactory responses. This method can be used when the
number of unsatisfactory responses is proportionately small or variables with
unsatisfactory responses are not the key variables.
Returning to the field—Sometimes the interviewer has to re-contact respondents
if the responses provided by them are unsatisfactory. This is feasible especially for
industrial or business surveys, where the sample size is small and respondents are
easily traceable. The responses, however, may be different from those originally
Discarding unsatisfactory responses—In this approach, unsatisfactory responses from participants are totally discarded. This method is well suited when:
• Proportion of unsatisfactory respondents is very small compared with the
• Respondents with unsatisfactory responses do not differ from other respondents
with respect to important, demographic and user characteristics
• Unsatisfactory responses for each respondent are proportionately more in each
• Responses on key variables are missing
• To reiterate, editing has to be done with patience and care because it is an
important step in the processing of questionnaires.
The process of assigning numbers or other symbols to answers in order to group
the responses into limited categories is known as coding. For example, instead of
using the word ‘landlord’ or ‘tenant’ in response to a question that asks for
identification of one’s residential status, one can use the codes ‘LLD’ or ‘TNT’.
This variable can also be coded as 1 for landlord and 2 for tenant, which is then
known as numeric coding. This type of categorization and coding sacrifices some
detail but is necessary for efficient data analysis. It helps researchers to pack
several replies into a few categories that contain critical information required for
6.2.1 Categorization Rules
A researcher should follow four rules while categorizing replies obtained from a
questionnaire. The categories are—appropriate, exhaustive, mutually exclusive
and derived from one classification principle.
Appropriate. Categorization should help to validate the hypotheses of the
research study. If a hypothesis aims to establish a relationship between key variables, then appropriate categories should be designed to facilitate comparison
between those variables. Categorization provides for better screening of data for
testing and establishing links among key variables. For example, if specific income
is critical for a testing relationship, then wider income classifications may not yield
the best results upon analysis.
Exhaustive. When multiple-choice questions are used, an adequate list of
alternatives should be provided to tap the full range of information from respondents. The absence of any response from the set of response options given will
prove detrimental as that specific response will be underrepresented in the analysis. For example, a questionnaire designed to capture the age group of respondents should list all possible alternatives that a respondent may fall into.
Mutually exclusive. Complying with this rule requires that a specific alternative
is placed in one and only one cell of a category set. For example, in a survey, the
classification may be (1) Professional (2) Self-employed (3) Government service
(4) Agriculture (5) Unemployed. Some self-employed respondents may consider
themselves as professionals, and these respondents will fit into more than one
category. A researcher should avoid having categories that are not mutually
Single Dimension. This means every class in the category set is defined in terms
of one concept. If more than one dimension is used, it may not be mutually
exclusive unless there is a combination of dimensions like (1) Engineering student
(2) Managerial student and the like in the response options.
6 Data Preparation and Preliminary Analysis
6.2.2 Code Book
To make data entry less erroneous and more efficient, a rule book called a ‘code
book’ or ‘coding scheme’ is used, which guides research staff. A code book gives
coding rules for each variable that appears in a survey. It is also the basic source
for locating the positions of variables in the data file during the analysis process.
Most code books generally contain the question number, variable number, location
of the variable’s code on the input medium, descriptors for the response options
and variable name. A code book for the questionnaire in Fig. 6.2 above is shown in
Fig. 6.3 below.
6.2.3 Coding Close-Ended Questions
It is easy to assign codes for responses that would be generated by close-ended
questions. This is because the number of answers is fixed. Assigning appropriate
codes in the initial stages of research makes it possible to pre-code a questionnaire.
This avoids the tiresome, intermediate step of framing the coding sheet prior to
data entry. Coding makes it easier for data to be accessed directly from the
questionnaire. The interviewer assigns appropriate numerical responses to each
item (question) in the questionnaire. This code is later transferred to an input
medium for analysis.
6.2.4 Coding Open-Ended Questions
Questionnaire data obtained from close-ended questions are relatively easy to code
as there are a definite number of predetermined responses. But a researcher cannot
always use close-ended questions as it is impossible to prepare an exhaustive list of
responses for a question aimed at probing a person’s perception or attitude to a
particular product or issue. Thus, use of open-ended questions becomes inevitable
in business research.
However, coding the data collected from open-ended questions is much more
difficult as the responses are unlimited and varied.
In the questionnaire shown in Fig. 6.2, questions 2, 4 and 6 are open-ended
questions. After preliminary evaluation and coding, response categories for the
second question are shown in Fig. 6.3. The response categories also include the
‘other’ category to satisfy the coding rule of exhaustiveness.
Content analysis for open-ended questions. A qualitative method known as
content analysis can be used to analyse the text provided in the response category
of open-ended questions. The purpose is twofold. First, content analysis systematically and objectively derives categories of responses that represent homogenous
Fig. 6.3 Code book
(Adapted from Donald and
Number of chocolates
1 = Less than 5
2 = Between 5 and 10
3 = Between 10 and 20
4 = Above 20
5 = Don’t know
9 = Missing
0 = Not mentioned
1 = Mentioned
1 = Cadbury’s
2 = Nutrine
3 = Nestle
4 = Amul
5 = others
9 = missing
0 = not mentioned
1 = mentioned
Bought new goods
1 = yes
0 = no
0 = mentioned
1 = not mentioned
1 = below 10
2 = between 10 and 20
3 = between 21 and 30
4 = above 30
9 = missing
6 Data Preparation and Preliminary Analysis
thoughts or opinions. This facilitates interpretation of large volumes of lengthy and
Second, content analysis identifies responses particularly relevant to the survey.
This form of content analysis is known as open coding or context-sensitive scheme
coding. It requires the researcher to name categories through a detailed examination of data. Thus, rather than a pre-determined framework of possible
responses, the researcher works using actual responses provided by respondents to
generate the categories used to summarize data. This involves an iterative interpretation process of first reading the responses and then rereading them again to
establish meaningful categories, and finally, re-reading select responses to refine
the number and meaning of categories in a manner that is most representative of
the respondents’ text. Each response is then coded into as many categories as
necessary to capture the ‘full picture’ of the respondent’s thoughts or opinions. To
reduce potential coding errors, responses out of context of the question are not
Let us look at the example questionnaire and do content analysis for the second
open question in Fig. 6.2.
Q ‘Why do you eat chocolates?’ (Sample responses are as follows)
I can afford to buy them.
It’s shape and size are nice.
No other confectionery can match the taste of chocolates.
It’s very sweet.
I enjoy the taste.
1. I love the taste and softness.
The first step in analysis requires that the categories selected should reflect the
objectives for which the data have been collected. The research question is concerned with the reason behind the respondents’ interest in eating chocolates. The
categories selected are keywords. The first pass through the data produces a few
general categories as shown in Fig. 6.4. These categories should contain one
dimension of reason and be mutually exclusive. The use of ‘other’ makes the
Fig. 6.4 Example of coding
for an open-ended question
category set exhaustive so that any dimension, which cannot be captured in the
listed categories, can be assigned to the ‘other’ category.
In general, a second evaluation of responses and categories is made so that
some sub-categories can be found which remain undiscovered in the first
Q ‘Why do you like chocolates?’ (Tick as many of the following as applicable)
(Presume answers will be given)
6.2.5 Coding ‘Do not Knows’
Although researchers include the option of ‘do not know’ in the possible answers
to a question to ensure exhaustiveness, at times it poses problems while analysing.
This is particularly so if a considerable number of respondents choose the ‘do not
know’ option. Respondents may choose this response either because they really do
not have an answer or because they do not want to answer the question because of
Though the ‘do not know’ (DK) option is inserted in the questionnaire to assess
the actual number of respondents who do not know the answer, the number of
evasive respondents choosing that option often negates this purpose.
There are two kinds of DK responses, that is, the ‘legitimate DK’ and the
‘disguised DK’. Responses of the first kind are acceptable. Respondents give such
answers when they are unaware of the answer may be due to recall problems or
memory decay. The second type of response is mainly because of poor preparation
of the questionnaire or the questioning process. At times, the respondent may be
reluctant to answer the question or may feel that the question is inconsequential.
Researchers and the interviewers in the field play a major role in decreasing the
proportion of ‘disguised DK’. A carefully designed questionnaire can decrease the
number of ‘disguised DK’ responses. The rest can be handled by interviewers in
the field. An interviewer must identify in advance possible questions that entail
key variables, for which DK responses would make things difficult. The researcher
can use various probing techniques to get definite answers or find out why the
respondent has selected a DK response.
There is always a possibility that a considerable number of DK responses might
be generated for some questions despite efforts to check the occurrence of such
responses. In such cases, the researcher can either ignore that response or allocate
the frequency to all other responses in the ratio that they occur. For example, in
Table 6.1, 21 % of the respondents below 10 years select the DK response. Here,
the researcher can either ignore the last column or allocate the DK responses to
other two responses (\5 and [20) proportionally.