Tải bản đầy đủ

2…SPSS Output Interpretation for Hierarchical Clustering

10.2

SPSS Output Interpretation for Hierarchical Clustering

235

Table 10.2 Agglomeration schedule for first 20 stages of clustering

Agglomeration schedule

Stage

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Cluster combined

Cluster 1

Cluster 2

8

14

11

32

90

19

74

77

30

37

55

26

57

1

38

2

20

16

84

69

10

99

98

93

91

88

86

83

81

80

73

60

59

58

54

46

35

32

85

76

Coefficients

0.000

0.500

1.000

1.500

2.000

2.500

3.000

3.500

4.000

4.500

5.000

5.500

6.000

6.500

7.000

7.500

8.000

8.833

9.833

10.833

Stage cluster first appears

Cluster 1

Cluster 2

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

4

0

0

Next stage

61

34

28

18

29

50

55

30

38

37

52

76

60

41

36

29

49

67

42

56

clustered in fourth stage. In the last column, Next Stage indicates at which stage

the cases are clustering again. For example, in stage one, under Next Stage the

value mentioned is 61, it is nothing but the stage at which the first stage cluster

(cases 8 and 10) is clustering again with another case/clusters.

Another way to identify the formation of clusters in hierarchical cluster analysis

is through icicle plot as mentioned in Fig. 10.1. The plot resembles a row of icicles

hanging from eaves. Figure 10.1 mentioned a vertical icicle plot, in which the

rows of the plot represent the cases (here 99 cases) that are being clustered and

columns represent the stages involved in the formation of clusters. The vertical

icicle plot should be read from ‘‘left to right’’. Once we read from left to right, in

between case numbers 8 and 10, there are no white spaces. That is, it supports the

agglomeration schedule that cases 8 and 10 are clustering in the first stage. As we

go again to the right, we can see in the plot that in between case numbers 14 and 99

a little bit white space. Therefore, we can infer that in second stage, case numbers

14 and 99 are clustering in the second stage. This process continues until all the

cases are identified, which belong to a single cluster.

Yet, another graphical way to identify the number of cluster solution is through

looking at the dendrogram as shown in Fig. 10.2. It is a tree diagram, considered to

be a critical component of hierarchical clustering output. This graphical output

displays a relative similarity between cases considered for cluster analysis. In

236

10

Cluster Analysis

Fig. 10.2 Dendrogram

Fig. 10.2, looking at the dendrogram, we can interpret the clustering through

reading the diagram from ‘‘left to right’’. In the upper part of the diagram, it is

represented as ‘‘Rescaled Distance Cluster Combine’’. This shows that cluster

10.2

SPSS Output Interpretation for Hierarchical Clustering

237

distances are rescaled to get the range of the output from ‘‘0 to 25’’, in which 0

represents no distance and 25 represents highest distance.

10.2.1 Step 4: Decide Number of Clusters to be Retained

in the Final Cluster Solution

In hierarchical cluster analysis, the prominent part is to decide the number of

clusters. There are no hard and fast rules for final cluster solution, and there are

few guidelines that can be followed while deciding the number of cluster. The

following are some of the general guidelines for deciding number of clusters

(Table 10.3):

Tables 10.4, 10.5, and 10.6 show relative cluster size or frequency distribution

of two, three, and four cluster solutions, respectively. From the tables, it is evident

that in case of cluster 4, the distribution is more or less equal.

Table 10.3 Cluster solution determination

Theoretical base

Agglomeration

schedule

Dendrogram

Relative cluster

size

In this method, use the This method of cluster Drawing an imaginary In this case, one can

theoretical base or

determination is

vertical line

restrict the clusters

experience of the

not possible in all

through

to a limited number

researcher can be

the cases. In good

dendrogram will

(e.g., 3 or 4 or 5),

used to decide the

cluster solution,

show the number

so that we will get

number of clusters

while reading

of cluster solution.

a series of cluster

agglomeration

Looking at the

solutions from 2 to

schedule

number of cut

that number. Then

coefficients, a

points, one can

draw the relative

sudden jump

assess the number

frequency of cases

appears. The point

of clusters. If there

in each cluster.

just before the

are four cut points,

Finally, one can

sudden jump

then we can say

select that cluster

appears in the

that there are four

in which the

coefficient column

clusters

relative frequency

is the point of

distribution of

stopping point for

cases is almost

merging clusters

equal across

clusters

Table 10.4 Two clusters

Frequency

Percent

Valid percent

Cumulative percent

Group 1

Group 2

Total

23.2

76.8

100.0

23.2

76.8

100.0

23.2

100.0

23

76

99

238

10

Cluster Analysis

Table 10.5 Three clusters

Frequency

Percent

Valid percent

Cumulative percent

Group 1

Group 2

Group 3

Total

23.2

60.6

16.2

100.0

23.2

60.6

16.2

100.0

23.2

83.8

100.0

Table 10.6 Four clusters

Frequency

Percent

Valid percent

Cumulative percent

Group

Group

Group

Group

Total

23.2

24.2

16.2

36.4

100.0

23.2

24.2

16.2

36.4

100.0

23.2

47.5

63.6

100.0

1

2

3

4

23

60

16

99

23

24

16

36

99

10.2.2 Step 5: Calculate Cluster Centroid and Give Meaning

to Cluster Solution

After the determination of final cluster solution (here in this example, the final

cluster solution is 4), it is very important to find the meaning of the cluster solution

in terms of importance of cluster variate. It can be achieved though the determination of cluster centroids. Here, we have generated cluster centroid using a

multiple discriminant analysis. Table 10.7 shows the results generated through

discriminant analysis. From the results, it found that the group 1 or cluster 1 people

are showing high importance to all the seven variables. We can call them as

‘‘severe exploratory consumers’’. The second group of consumers are showing a

buying tendency, which is above average, so we can call them as ‘‘superior

exploratory consumers’’. The third group of people are showing a buying

behaviour tendency, which is mediocre. Therefore, we will call them as ‘‘Mediocre

consumers’’. Finally, the last group is the lowest in this category, so we will call

them as ‘‘stumpy consumers’’.

10.2.3 Step 6: Assess the Cluster Validity and Model Fit

The following are some of the suggestive procedures for confirming validity and

model fit in cluster analysis.

1. Run cluster analysis on the same data with different distance measures and

compare the results across distance measures. In addition, researcher can use

different methods of clustering on the same data, and later on, results can be

analysed and compared.

10.2

SPSS Output Interpretation for Hierarchical Clustering

Table 10.7 Group statistics

Four clusters

Mean

Group 1

Group 2

Group 3

Group 4

Total

V1

V2

V3

V4

V5

V6

V7

V1

V2

V3

V4

V5

V6

V7

V1

V2

V3

V4

V5

V6

V7

V1

V2

V3

V4

V5

V6

V7

V1

V2

V3

V4

V5

V6

V7

4.6957

5.3043

5.7826

5.1739

5.0000

5.4348

5.0435

3.6250

4.0833

4.0417

3.0000

3.9583

4.5417

4.0000

2.9375

3.7500

3.9375

3.1875

3.8750

1.7500

2.8750

4.5000

4.8056

5.1944

3.8889

4.6111

3.3889

4.1111

4.0808

4.5758

4.8485

3.8586

4.4242

3.8788

4.1010

Standard deviation

0.82212

0.70290

0.42174

0.65033

0.95346

0.66237

0.63806

0.57578

0.71728

0.69025

0.93250

0.62409

1.14129

0.78019

0.77190

1.23828

1.18145

0.98107

0.95743

0.77460

1.14746

0.73679

0.82183

0.66845

1.00791

1.20185

1.04957

0.91894

0.96549

1.01107

1.03375

1.21227

1.06991

1.54704

1.09260

239

Valid N (listwise)

Unweighted

Weighted

23

23

23

23

23

23

23

24

24

24

24

24

24

24

16

16

16

16

16

16

16

36

36

36

36

36

36

36

99

99

99

99

99

99

99

23.000

23.000

23.000

23.000

23.000

23.000

23.000

24.000

24.000

24.000

24.000

24.000

24.000

24.000

16.000

16.000

16.000

16.000

16.000

16.000

16.000

36.000

36.000

36.000

36.000

36.000

36.000

36.000

99.000

99.000

99.000

99.000

99.000

99.000

99.000

2. Divide the data into two parts and perform cluster analysis for these two halves.

Cluster centroids can be compared for their consistency for the split samples.

3. Add or delete the original set of variables and perform cluster analysis and

compare the results for each set of variables.

240

10

Cluster Analysis

10.3 SPSS Procedure for Hierarchical Cluster Analysis

=>Open the data

=[ Analyse =[Classify =[Hierarchical Cluster Analysis

10.3

SPSS Procedure for Hierarchical Cluster Analysis

241

=[ Select all the seven variables and place it on the Variables box

=[ Statistics =[Click on Agglomeration Schedule and Proximity matrix,

then click Continue

## 2014 s sreejesh, sanjay mohapatra, m r anusree (auth ) business research methods an applied orientation springer international publishing (2014)

## Part IV: Multivariate Data Analysis Using IBM SPSS 20.0

## 3…Role of Business Research in Decision-Making

## 6…Business Research and the Internet

## 1…Steps in the Research ProcessSteps in the Research Process

## 2…Part I: Exploratory Research Design

## 3…Part II: Descriptive Research Design

## 6…Errors in Survey Research

## 8…Part III: Causal Research Design

## 11…Type of Experimental DesignsExperimental designs

## 1…Identifying and Deciding on the Variables to be Measured

Tài liệu liên quan