7…Examination of Major Assumptions of Multiple Regressionmultiple regression Analysis
Tải bản đầy đủ
8.7 Examination of Major Assumptions
203
percentages of residuals falling within ± 2 SE or ± 2.5 SE. More formal assessment can be made by running the tests: Shapiro–Wilk, Kolmogorov–Smirnov,
Cramer–von Mises and Anderson–Darling.1
How to Autocorrelation Assumption.mp4
8.7.4 Test of Homogeneity of Variance (Homoscedasticity)
The assumption of constant variance of the error term can be examined by plotting
ˆ i. If the
the residuals against the predicted values of the dependent variable, Y
pattern is not random, the variance of the error term is not constant. See the video
How to check Normality Assumption.
8.7.5 Test of Autocorrelation
How to Check No Multicollinearity Assumption.mp4
A plot of residuals against time, or the sequence of observations, will throw
some light on the assumption that the error terms are uncorrelated or no autocorrelation. A random pattern should be seen if this assumption is true. A more
formal procedure for examining the correlations between the error terms is the
Durbin–Watson test (Applicable only for time series data).
8.7.6 Test of Multicollinearity
The presence of multicollinearity or perfect linear relationship between independent variables can be identified using different methods. These methods are:
1. VIF (Variance-Inflating factor): As a rule of thumb, If the VIF value exceeds
10, which will happen only if correlation between independent variables
exceeds 0.90, that variable is said to be highly collinear (Gujarati and
Sangeetha 2008).
1
Null hypothesis the observations are normally distributed, alternative hypothesis not normally
distributed.
204
8 Multiple Regression
2. TOL (Tolerance): The closer the TOL to zero, the greater the degree of collinearity of the variables (Gujarati and Sangeetha 2008).
3. Conditional Index (CI): If CI exceeds 30, there is severe multicollinearity
(Gujarati and Sangeetha 2008).
4. Partial Correlations: High partial correlation between independent variables
also shows the presence of multicollinearity.
How to Check No Multicollinearity Assumption.mp4
8.7.7 Questions
Examine the following fictitious data
Model
1
R
0.863
R2
0.849
Adjusted R square
0.850
Std. error of the estimate
13.8767
1. Which of the following statements can we not say?
(a) The standard error is an estimate of the variance of y, for each value of x.
(b) In order to obtain a measure of explained variance, you need to square
the correlation coefficient.
(c) The correlation between x and y is 86 %.
d) The correlation is good here as the data points cluster around the line of fit
quite well. So prediction will be good.
(e) The correlation between x and y is 85 %.
2. The slope of the line is called:
(a) Which gives us a measure of how much y changes as x changes.
(b) Is the point where the regression line cuts the vertical axis.
(c) A correlation coefficient indicates the variability of the points around the
regression line in the scatter diagram.
(d) None of the above.
(e) The average value of the dependent variable.
3. Using some fictitious data, we wish to predict the musical ability for a person
who scores 8 on a test for mathematical ability. We know the relationship is
positive. We know that the slope is 1.63 and the intercept is 8.41. What is their
predicted score on musical ability?
8.7 Examination of Major Assumptions
(a)
(b)
(c)
(d)
(e)
205
80.32
-4.63
21.45
68.91
54.55
4. We have a negative relationship between number of drinks consumed and
number of marks in a driving test. One individual scores 3 on number of drinks
consumed, another individual scores 5 on number of drinks consumed. What
will be their respective scores on the driving test if the intercept is 18 and the
slope 3?
(a) It is not possible to predict from negative relationships.
(b) Driving test scores (Y-axis) will be 51 and 87 [individual who scored 5 on
drink consumption].
(c) Driving test scores (Y-axis) will be 27 [individual who scored 3 on drink
consumption] and 33 [individual who scored 5 on drink consumption].
(d) Driving test scores (Y-axis) will be 9 [individual who scored 3 on drink
consumption] and 3 [individual who scored 5 on drink consumption].
(e) None of these.
5. You are still interested in whether problem-solving ability can predict the
ability to cope well in difficult situations; whether motivation can predict
coping and whether these two factors together predict coping even better. You
produce some more results.
Dependent variable coping skills in difficult situations
Constant
Problem
Motivation
Unstandardized coefficients
Standardized coefficients
B
-0.466
0.200
0.950
Beta
Std. error
0.241
0.048
0.087
0.140
0.740
t
Sig.
1.036
2.082
10.97
0.302
0.030
0.000
Which of the following statements is incorrect?
(a) As motivation increases by one standard deviation, coping skills increases by
almost three quarters of a standard deviation (0.74). Thus, motivation appears
to contribute more to coping skills than problem solving.
(b) As motivation increases by one unit coping skills increases by 0.95.
206
8 Multiple Regression
(c) The t-value for problem solving is 2.082 and the associated probability is
0.03. This tells us the likelihood of such a result arising by sampling error,
assuming the null hypothesis is true, is 97 in 100.
(d) Problem solving has a regression coefficient of 0.20. Therefore, as problem
solving increases by one unit coping skills increases by 0.20.
(e) None of these.
Chapter 9
Exploratory Factor and Principal
Component Analysis
Chapter Overview
This chapter provides an introduction to Factor Analysis (FA): A procedure to
define the underlying structure among the variables in the analysis. The chapter
provides general requirements, statistical assumptions and conceptual assumptions
behind FA. This chapter explains the way to do FA with IBM SPSS 20.0. It shows
how to determine the number of factors to retain, interpret the rotated solution,
create factor scores and summarize the results. Fictitious data from two studies are
analysed to illustrate these procedures. The present chapter deals only with the
creation of orthogonal (uncorrelated) components.
9.1 What is Factor Analysis
According to Hair et al. (2010),1 ‘factor analysis is an interdependence technique
whose primary purpose is to define the underlying structure among the variables in
the analysis’. Suppose a marketing researcher wants to identify the underlying
dimensions of retail brand attractiveness. He begins by administering the retail
brand attractiveness scale from the existing literature to a large sample of people
(N = 2000) during their visit in a particular retail store. Assume that there are five
different dimensions, which consist of 30 different items. What the researcher will
end up with these 30 different observed variables, the mass number as such will
say very little about the underlying dimension of this retail attractiveness. On
average, some of the scores will be high, some will be low and some intermediate,
but interpretation of these scores will be extremely difficult if not impossible. This
is where the tool factor analysis (FA) comes in handy and it allows the researcher
in ‘data reduction’ and ‘data summarization’ of this large pool of items to a few
representative factors or dimensions, which could be used for further multivariate
1
See Ref. Hair et al. (2010).
S. Sreejesh et al., Business Research Methods,
DOI: 10.1007/978-3-319-00539-3_9,
Ó Springer International Publishing Switzerland 2014
207
208
9 Exploratory Factor and Principal Component Analysis
statistical analysis. The general purpose of FA is the orderly simplification of a
large number of intercorrelated measures or condense the information contained in
a number of original variables into a few representative constructs or factors with
minimal loss of information. The application of FA is based on some of the
following conditions: general requirement, statistical assumptions and conceptual
assumptions (See Table 9.1).
FA is used in the following circumstances:
1. To identify underlying dimensions, or factors, that explains the correlations
among a set of variables. For example, a set of personality trait statements may
be used to measure the personality dimensions of people. These statements may
then be factor analysed to identify the underlying dimensions of personality
trait or factors.
2. To identify a new, smaller, set of uncorrelated variables to replace the original
set of correlated variables in subsequent multivariate analysis (regression or
discriminant analysis). For example, the psychographic factors identified may
be used as independent variables in explaining the differences between loyal
and non-loyal consumers.
3. To identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis. For example, a few of the original lifestyle
Table 9.1 Conditions for doing factor analysis
General Requirements
1. Type of scale: Observed variables should be measured in either interval or ration scales, or at
least at the ordinary level
2. Number of Items: If the researcher has prior knowledge about the underlying factor structure
and want to test the dimensionality, then at least five or more variables should be included to
represent each factor structure
3. Sample size: The rule of thumb for sample size is to have at least five times as many cases as
variables entered into factor analysis +10
Statistical Assumptions
1. Random sampling: Each participant will contribute one response for each observed variable.
These sets of scores should represent a random sample drawn from the population of interest
2. Linearity: The relationship between all observed variables should be linear
3. Bivariate Normal Distribution: Each pair of observed variables should display a bivariate
normal distribution (e.g. they should form an elliptical scattergram when plotted)
Conceptual Assumptions
1. Variable Selection: Factor analysis is based on the basic assumption that there exists an
underlying structure for the selected set of variables. The presence of high correlation and
subsequent interpretation of do not guarantee relevance, even if it meets statistical
assumptions. Therefore, it is the responsibility of the researcher to select the set of variables or
items that are conceptually valid and appropriate to represent the underlying dimension
2. Sample Homogeneity: Another important conceptual assumption with regard to the factor
analysis is that the selected sample should be homogeneous with respect to the underlying
factor structure. It is inappropriate to do factor analysis for a set of items once the researcher
knows a priori that the sample of male and female is different because of gender. The
ignorance of this heterogeneity, and subsequent mixing of two groups (males and females
would results in getting a correlation matrix and factor structure, that will be a poor
representation of the unique structure of each group