Tải bản đầy đủ
Chapter 9. Bar Plots (Bar Charts)

Chapter 9. Bar Plots (Bar Charts)

Tải bản đầy đủ

# preliminary to Fig. 9-1
# if you have not yet installed car
rankcount = table(rank) #get counts & save in vector rankcount
# print results
AsstProf AssocProf


The barplot() function shown in the code that follows uses bar
height to represent the elements in a vector; in this case, it is the
counts of each faculty rank. Thus, the graph will have three bars, the
first two of nearly equal height, and the third one about four times
the height of the other two:
# Fig. 9-1a
barplot(rankcount, ylab = "Count", col = "skyblue",
main = "Faculty by Rank", sub = "a. Number in each rank")

Figure 9-1a shows this bar plot.



Chapter 9: Bar Plots (Bar Charts)

Figure 9-1. Bar plots of the number of professors in each category, of
the average salary in each category, and counts by rank and sex.
Note that although the bar plot looks a little like a histogram, it is
quite different. The bars in the histogram were defined by breaking
a quantitative variable, ever increasing (or ever decreasing) along
the axis, into sections. You could define the bars by different break‐
points, if desired. The bar chart uses discrete—or even categorical—
definitions of the bars, so breakpoints are usually fixed and logically
cannot be moved. The bars of a bar plot could be male/female or
horse/cow/pig or mountain/seashore or gold/diamond/paper
money, or any other categories that are mutually exclusive and not
quantitative. Fitting a density plot makes sense over a histogram, but
not usually over a bar plot.
In Figure 9-1a, the height of the bars represented a count of items in
each bar. Although bar plots often are used for displaying counts,
the height of the bar could represent anything; for example a meas‐
urement, a mean, income after taxes, and so on. Figure 9-1b illus‐
Basic Bar Plot



trates such a bar plot that shows the average salary of each rank. To
make such a graph, you must first put the average salaries in a vector
by means of the aggregate() function and then call barplot() to
operate on the newly created data. The expression salary~rank
indicates that some operation will be performed on salary for each
of the three ranks. FUN = mean shows that the operation will be find‐
ing the mean:
aver = aggregate(salary ~ rank, FUN = mean) # aver is new vector
# see what is in aver
1 AsstProf 80775.99
2 AssocProf 93876.44
Prof 126772.11
# Fig. 9-1b- bar height shows mean salary, names are ranks
barplot(aver$salary, ylab = "Average Salary",
names.arg = aver$rank, col = "skyblue",
main = "Faculty Salaries", sub = "b. Average salary by rank")

You can modify the bar plot to show the relationship between two
variables. One way to accomplish this is the stacked bar plot, which
you can see in Figure 9-1c. In this next example, two bars, showing
numbers of male and female professors in the study, are each broken
into smaller sections showing how many of each sex hold each rank.
The first thing to do is to create a table, rank2, breaking down all the
professors into groups of rank and sex:
rank2 = table(rank,sex)
Female Male
18 248

You can display the data in rank2 by using barplot() (the result
appears in Figure 9-1c):
# Fig. 9-1c
barplot(rank2, ylab = "Count", names.arg = c("Female","Male"),
main = "Faculty by Rank and Sex",
col = c("skyblue","skyblue4","burlywood"),
sub = "c. Stacked plot")
legend("topleft", c("Prof","Assoc","Asst"),
text.col = c("burlywood","skyblue4","skyblue"))

It is sometimes difficult to interpret a stacked bar plot, so you might
want to consider another option. The various rectangles represent‐


Chapter 9: Bar Plots (Bar Charts)

ing each combination of rank and sex can each become a separate
bar and be grouped, in this case by putting all the bars for one sex
together. You can see such a graph in Figure 9-1d. This requires one
modification from the code used to create Figure 9-1c, which is to
add the beside = T argument:
# Fig. 9-1d
barplot(rank2, ylab = "Count", names.arg = c("Female","Male"),
main = "Faculty by Rank and Sex",
col = c("skyblue","skyblue4","burlywood"),
sub = "d. Grouped plot", beside = T)
legend("topleft", c("Prof","Assoc","Asst"),
text.col = c("burlywood","skyblue4","skyblue"))

Note that the legend() function was added to the code for Figures
9-1c and 9-1d. This is to add extra text to the graph to explain the
meaning of various colors and/or symbols. Depending on context,
the legend can be essential or unnecessary, and sometimes even
counterproductive. The legend can become clutter, so it is important
to determine if it is needed. If it is, your next decision is where to
place it for best effect. It is usually best to put the legend in a part of
the graph that is relatively far away from important figures. Note
that in Figure 9-1c and 9-1d, the legend is in the upper-left corner.
This is done by using the "topleft" argument. The names in the
legend correspond to the values of rank, and the color vector (col
=) is exactly the same as the color vector in the barplot() com‐

Spine Plot
We can improve the stacked bar plot that was a little difficult to read
in Figure 9-1c by using a variation known as the spine plot (also
called a spinogram or proportional stacked bar graph). The idea is
that each of the six rectangles will be proportional in area to the
number of professors in that combination, as they were in the
stacked bar plot. However, in the bar plot both bars were the same
width, and therefore the height was the sole indicator of the count
within a particular sex/rank combination. This resulted in the height
of some portions of bars being so small that they were difficult to
compare to others. The spine plot takes a different approach. Both
bars will be the same height, but they will be different widths. A
scale located on the right side covers the interval from 0 to 1, mak‐
ing it easy to estimate the proportion of a rank within a given bar.

Spine Plot



Compare the spine plot in Figure 9-2 to the stacked bar plot in Fig‐
ure 9-1c. Which is more comprehensible?

Figure 9-2. A spine plot (spinogram). Compare this to Figure 9-1c.
Which of the two is easier to comprehend?
Here is the code that produces Figure 9-2:
# script for Figure 9-2
rank3 = table(sex, rank)
AsstProf AssocProf Prof
54 248
spineplot(rank3, col = c("skyblue","skyblue4","burlywood"),
main = "Faculty by Sex and Rank")


| Chapter 9: Bar Plots (Bar Charts)

Bar Spacing and Orientation
The spacing and orientation of the bars in a chart are important to
communicating the message. Consider the problem of comparing
the salaries of the six combinations of sex and rank. The following
script shows several ways to present the average salaries for each of
the combinations:
# Script for Fig. 9-3
library(car) # Fig. 9-3
par(mfrow = c(2, 2))
grp.sal = aggregate(
salary ~ sex * rank, FUN = mean) # mean of each group
# labels
sexcol =
sexlab =

reused several times, can type vector name in commands
= c(" Asst", " ", " Assoc", " ", " Prof", "")
c("blue", "maroon")
c("Female", "Male")

# Fig. 9-3a
barplot(grp.sal$salary, ylab = "average salary",
names.arg = rankname, col = sexcol,
main = "Faculty Salaries",
sub = "a. Default spacing between bars")
legend("topleft", sexlab, text.col = sexcol,
text.font = 2, title = "Sex",
title.col = "black", cex = 0.8)
# Fig. 9-3b
barplot(grp.sal$salary, ylab = "average salary",
names.arg = rankname, col = sexcol,
main = "Faculty Salaries", space = 1.5,
sub = "b. Wide space between, space = 1.5")
legend("topleft", sexlab, text.col = sexcol, text.font = 2,
bty = "n")
# Fig. 9-3c
barplot(grp.sal$salary, ylab = "average salary",
names.arg = rankname, col = sexcol,
main = "Faculty Salaries", space = c(1, 0, 1, 0, 1, 0),
sub = "c. Same rank together, space = c(1,0,1,0,1,0)")
legend("topleft", sexlab, text.col = sexcol,
text.font = 2, bty = "n")
# Fig. 9-3d
barplot(grp.sal$salary, ylab = "average salary", col = sexcol,
main = "Faculty Salaries", space = c(1, 0, 1, 0, 1, 0),
horiz = T, sub = "d. Horizontal version of c. horiz=T",
names.arg = rankname,

Bar Spacing and Orientation



cex.names = 0.8, las = 1)
legend("bottomright", sexlab, text.col = sexcol,
text.font = 2, bty = "n")

First, we get a vector of mean salaries, grp.mean, for each combina‐
tion by using the aggregate() function. The expression salary ~
sex * rank indicates that some operation will be performed on sal
ary in each of the six combinations of sex and rank. FUN = mean
shows that the operation will be finding the mean. We will make
several bar plots, showing different spacings between the bars and a
change of vertical bars to horizontal ones. Such changes can make a
difference in how we perceive the plots.
The next step is to define some character vectors, rankname, sexcol,
and sexlab. This is not necessary, but it’s a definite convenience:
rather than typing out the character strings in each of the following
calls of the barplot() function, you can substitute the relatively
short vector names.
Figure 9-3 shows that there are four bar plots produced.



Chapter 9: Bar Plots (Bar Charts)

Figure 9-3. Four variants of the same graph, juxtaposing rank and sex.
Each plot command is followed by a legend() command, putting a
legend on the previously produced bar plot. You could type all of the
lines separately in the console; however, it is usually more conve‐
nient to put a group of commands for which you want to see the
results on one screen, or one page, into a script. This way, if you
make a mistake, you can simply correct the one error and run the
entire batch of commands again without retyping all of it.
The four bar plots are quite similar, in terms of the groups exam‐
ined, the colors, the labels, and the legends. The most important dif‐
ference is the spacing between the bars, and in the last plot the
orientation is different, too. In the first plot, Figure 9-3a, the space
argument does not appear, so the default value is used to make the
graph. In Figure 9-3b, the bars are widely separated because we used
space = 1.5; that is, the spaces between bars are 1.5 times the width
of the bars. In the last two bar plots, the male and female bars for
each rank are adjacent, whereas the ranks are separated. This is
Bar Spacing and Orientation |


accomplished by using the argument space = c(1, 0, 1, 0, 1,
0), which instructs R that there should be a space of size 1 before the
first bar, size 0 before the second, and so on.
The legend() command for the first plot produces a very tradi‐
tional legend with a title and a box around it. In this case, neither of
those elements is really necessary, so the other plots leave out the
title argument and add bty = "n", which deletes the box.
Compare the four bar plots in Figure 9-3. Notice that the legend in
Figure 9-3a makes the graph look a little cluttered. The other graphs
draw attention directly to the bars. If you examine the first two plots
carefully, you will eventually notice that females have a lower aver‐
age salary than males in every rank, but, especially in Figure 9-3b,
this is obscured a bit by the wide separation of the bars. Conversely,
in the last two graphs, the difference is obvious immediately! This
demonstrates that just as you should carefully choose words to make
your meaning clear, so too should you choose graphic devices to
make your point as clear as possible.

Exercise 9-1
There is a little quirk in the legend in Figure 9-3d. The legend was
fine for the first three graphs, but when the graph was made hori‐
zontal, the order in the legend should have been changed. Why? Try
to fix it.

Exercise 9-2
This one is challenging, but will help you to see how much R you
can do on your own. With the Salaries dataset used in this chapter,
try to reproduce the graph in Figure 9-4, this time by using the pyra
mid() function in the epicalc package. Is this a bar plot or a histo‐


| Chapter 9: Bar Plots (Bar Charts)

Figure 9-4. Salaries by sex.

Bar Spacing and Orientation