Tải bản đầy đủ
Chapter 21. Resources for Extending Your Knowledge of Things Graphical and R Fluency

Chapter 21. Resources for Extending Your Knowledge of Things Graphical and R Fluency

Tải bản đầy đủ

excellent materials for further self-study readily available. What fol‐
lows is far from a comprehensive list, but it includes materials that I
have found helpful and feel comfortable recommending to you. For
full references, see Appendix A.

R Graphics
The Comprehensive R Archive Network (CRAN) is part of the R
Project. CRAN Task Views, at http://cran.r-project.org/web/views/,
gives an overview of R packages, broken down by categories. If you
click Graphics, you will see a general discussion of graphics pack‐
ages with some mention of specific packages and their strengths as
well as links to documentation for many graphics packages. What
you will not find here is information about packages that might have
some useful graphic features but are not primarily graphics pack‐
ages. If you have a very specialized interest, take a look at the corre‐
sponding category: say, the Survival category if you are interested in
survival curves, TimeSeries if you are interested in plotting timeseries data, SpacioTemporal if you are interested in geography, and
so on.
If you want to delve into the lattice package, look at the very read‐
able book by the creator of this package, Sarkar (2008). Likewise,
Wickham (2009) is the package creator’s approachable book on
ggplot2. Chang (2013) is a “cookbook” with lots of recipes for mak‐
ing graphs in R, mostly with ggplot2. This book is especially appro‐
priate if you know beforehand the basics of R and a bit about
various types of graphs—but, of course, now you do!

General Principles of Graphics
Tufte (1983) is probably one of the most cited books on data graph‐
ics. The author covers centuries of graphics and deduces a number
of principles of effective display. There are many great—and poor—
examples from which to learn what makes a good graph.
The books by Cleveland (1985, 1993) are masterworks of clear and
logical thought. They do require a bit of math for complete under‐
standing, but will give most readers—even those without advanced
math backgrounds—a much better grasp of graphic principles. The
graphs look a little plain compared to the colorful displays now pos‐



Chapter 21: Resources for Extending Your Knowledge of Things Graphical and R Fluency

sible, but the design of Cleveland’s graphs surpasses most others
anyway. Much of lattice derives from these two books.

Learning More About R
R is becoming enormously popular, and there are now a large num‐
ber of books on the market devoted to it. I cannot tell you which is
the best, but my favorite for general data analysis with R is Kabacoff
(2011). An expanded second edition has just been published, but I
have not seen it yet. To get the most out of this book, you should
understand basic statistics.
If you want to know more about R as a programming language, see
Matloff (2011). You might now know most of what you wanted to
learn about graphics. There are, however, lots of issues with data
handling, simulations, text strings, and a host of other subjects that
you probably cannot imagine yet that Matloff deals with.

Statistics with R
If you did not have any background in statistics before reading this
book, you might want to learn something about this subject now.
There are several basic textbooks on statistics that incorporate R.
One that I can suggest to you is Diez et al. (2012). In keeping with
the open source philosophy of R, this book is free and you can
download it at www.openintro.org. A paper copy is offered at Ama‐
zon for a very low cost. The datasets used in the book are in the open
intro package.
There you have it. I believe this book works as a prerequisite to most
of the resources discussed in this chapter, which is one of the rea‐
sons that I felt it needed to be written. I hope you will find your new
expertise in R graphics just what you were looking for.

Exercise 21-1
Here is a real test of how much you have learned: reproduce
Figure 1-3.

Learning More About R |




Bland, J. M. and Altman, D. G. 1986. “Statistical methods for assess‐
ing agreement between two methods of clinical measurement.” Lan‐
cet, 327(8476) i: 307–10.
Boslaugh, Sarah. 2013. Statistics in a Nutshell, 2nd ed. Sebastopol,
CA: O’Reilly.
Chang, Winston. 2013. R Graphics Cookbook. Sebastopol, CA:
Cleveland, William S. 1985. The Elements of Graphing Data. Mon‐
teray, CA: Wadsworth.
———. 1993. Visualizing Data. Summit, NJ: Hobart Press.
de Vries, Andrie and Meys, Joris. 2012. R for Dummies. Chichester,
England: John Wiley & Sons.
Deng, Henry and Wickham, Hadley. 2011. “Density estimation in
R.” http://vita.had.co.nz/papers/density-estimation.pdf.
Diez, David M., Barr, Christopher D., and Çetinkaya-Rundel, Mine.
2012. OpenIntro Statistics, 2nd ed. www.openintro.org.
Few, Stephen. 2009. Now You See It. Oakland, CA: Analytics Press.
Fox, John. 2005. “The R Commander: A Basic-Statistics Graphical
User Interface to R.” Journal of Statistical Software, 14(9): 1–42.


Gomez, M. and Hazen, K. 1970. “Evaluating sulfur and ash distribu‐
tion in coal seams by statistical response surface regression analysis.”
Report RI 7377, US Bureau of Mines, Washington, DC.
Hanneman, S. K. 2008. “Design, Analysis and Interpretation of
Method-Comparison Studies.” AACN Advanced Critical Care, 19(2):
Iannaccone, L. R. (1994). “Why Strict Churches Are Strong.” Ameri‐
can Journal of Sociology, 99(5): 1180–211.
James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Rob‐
ert. 2013. An Introduction to Statistical Learning: with Applications in
R. New York: Springer.
Janert, Philipp K. 2011. Data Analysis with Open Source Tools. Sebas‐
topol, CA: O’Reilly.
Kabacoff, Robert I. 2011. R in Action. Shelter Island, NY: Manning.
Kleinman, Ken and Horton, Nicholas J. 2014. SAS and R: Data Man‐
agement, Statistical Analysis and Graphics, 2nd ed. Boca Raton, Lon‐
don, New York: CRC Press.
Ligges, U. and Maechler, M. 2003. “3D Scatter plots: an R Package
for Visualizing Multivariate Data.” Journal of Statistical Software
8(11): 1-20.
Matloff, Norman. 2011. The Art of R Programming. San Francisco,
CA: No Starch Press.
Murrell, Paul. 2011. R Graphics, 2nd ed. Boca Raton, FL: Chapman
and Hall.
Ramsey, Fred and Schafer, Daniel. 2001. Statistical Sleuth, 2nd ed.
Pacific Grove, CA: Brooks/Cole.
Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with
R. New York, NY: Springer.
Tufte, Edward R. 1983. The Visual Display of Quantitative Informa‐
tion. Chesire, CT: Graphics Press.
Tukey, John W. 1977. Exploratory Data Analysis. Reading, MA:
Wainer, Howard. 1984. “How to Display Data Badly.” American Sta‐
tistician, 38(2): 137–47.



Appendix A: References

Wickham, Hadley. 2009. ggplot2: Elegant Graphics for Data Analysis.
New York: Springer.
Wilk, M. B. and Gnanadesikan, R. 1968. “Probability plotting meth‐
ods for the analysis of data,” Biometrika, 55(1): 1–17.
Wilkinson, Leland and Friendly, Michael. 2009. “The History of the
Cluster Heat Map.” American Statistician, 63(2): 179–84.
Wong, Dona M. 2010. The Wall Street Journal Guide to Information
Graphics. New York: W. W. Norton.
Yau, Nathan. 2011.Visualize This. Indianapolis, IN: John Wiley &
Yau, Nathan. 2013. Data Points: Visualization That Means Something.
Indianapolis, IN: John Wiley & Sons.





R Colors

You can obtain a display of 657 named R colors by using the follow‐
ing command:
> demo(colors)

For a list of the color names, use this command:
> colors()

The following script produced the color table shown in Figure B-1 (I
included it here so that you can reproduce if it you want to print
your own copy):
# Script to produce color chart
par(col.axis="white",col.lab="white", mar=c(0.1,0.1,0.4,0.1),
n = c(0:656) # a number for each color
n2 = (n %%73) # each color has a number (1 to 73) in its column
cc = t(colors()) # color names
k = (2:9) # a number for each column
for(i in k) {
r = rep(c(i),times=73)
x = (c(x,r))
# print, at (x,n2), color rectangle


main="Named colors available in R",cex.main=.65)
x1 = x+ 0.5

# print (at x1,n2), the color name vector

Figure B-1. 657 named colors.
A nice R color chart by Professor Tian Zheng of Columbia Univer‐
sity is available on the Internet at http://www.stat.columbia.edu/



Appendix B: R Colors


The R Commander Graphical User

Some people just do not like the command-line interface of R and
would prefer to work in a graphical user interface (GUI; a.k.a. pointand-click) environment. If you do not work with R on a regular
basis, it can be hard to remember the R commands; or you might
find that you make a lot of mistakes when typing, or that it can be
painfully slow to make some simple graphs. Using R Commander
could make your life a little more pleasant, with the caveat that you
will not have access to the full range of R capabilities with the pointand-click interface.
If you want to try R Commander, you first must install it by using
the following command:
> install.packages("Rcmdr", dependencies=TRUE)

After you’ve installed it, you won’t need to do it again, but you must
load it during each session for which you want to use it. Here’s the
command to do that:
> library(Rcmdr)

The R Commander window appears in Figure C-1. You will proba‐
bly find that you can produce routine graphs/tables/analyses more
quickly by using R Commander, but some highly customized graphs
will not be possible. The console will stay open and you can go back
and forth between the two windows if you want to use both the GUI
and the command-line interface. Alternatively, you can type a com‐


mand into the R Script window of R Commander and select it. To
select a command, click the beginning of the line, drag across the
line to the end, and release the mouse button. The line will now be
highlighted. Click the Submit button, and R will execute the com‐

Figure C-1. The R Commander GUI interface for R.
Try working through the strip chart problems in Chapter 3 using R
Commander. At the top of the screen, on the menu bar, click Data.
On the menu that opens, choose “Data in packages” and “Read data‐
set from an attached package.” Figure C-2 shows the window that
opens, in which you can select the trees data set. Click OK.



Appendix C: The R Commander Graphical User Interface