Tải bản đầy đủ
Appendix B. Other Things to Do in R
The ff and bigmemory packages allow you to store R variables in files (like SAS does),
avoiding the limitations of RAM. On a related note, the data.table package provides
an enhanced data frame variable type that has faster indexing, assignment, and merging
There are dozens of spatial statistics packages: sp provides a standard way of storing
spatial data objects; maps, maptools, and mapproj provide helper functions for reading
and writing maps; spatstat provides functions for (you guessed it) spatial statistics;
OpenStreetMap retrieves raster images from http://openstreetmap.com; and so on.
Finally, R has a tool for combining code and output with regular text in a report (some‐
times called literate programming or reproducible research), called Sweave. This has been
improved upon and extended by the knitr package, which allows you to create reports
using a variety of markup languages. In fact, this book was written using knitr to create
AsciiDoc markup, which in turn was used to create PDF, HTML, and ebook documents.
Yihui Xie’s Dynamic Documents with R and knitr explains how to use it.
The R ecosystem is big, and getting bigger all the time. These are just a few of the things
that you might want to explore; take a look in the bibliography for some more ideas.
Have fun exploring!
| Appendix B: Other Things to Do in R
Answers to Quizzes
R is an open source reworking of the S programming language.
Choices include imperative, object-oriented, and functional.
help.search (which does the same as ??)
all.equal(x, pi) or, even better, isTRUE(all.equal(x, pi))
At least two of the following:
1. <2. + =
3. + <<4. assign
Just Inf and -Inf
0, Inf, and -Inf
numeric, integer, and complex
Any three of summary, head, str, unclass, attributes, or View. Bonus points if
you’ve discovered tail, the counterpart to head that prints the last few rows.
rm(list = ls())
seq.int(0, 1, 0.25)
Either by using name = value pairs when the vector is created, or by calling the
names function afterward.
Positive integers for locations to retrieve, negative integers for locations to avoid,
logical values, or names of elements.
3 * 4 * 5 = 60
3. The inner list counts as one element, and so does the NULL element.
When passing arguments to functions, when calling formals, or in the global en‐
vironment variable .Options.
You can use matrix-style indexing with pairs of positive integers/negative integers/
logical values/characters in single square brackets. You can also use list-style
Appendix C: Answers to Quizzes
indexing with one index value inside single or double square brackets, or the dollar
sign ($) operator. Thirdly, you can call the subset function.
By passing check.names = FALSE to data.frame.
rbind for appending vertically or cbind for appending horizontally.
The user workspace.
list2env is the best solution, but as.environment also works.
Just type its name.
formals, args, and formalArgs.
do.call calls a function with its arguments in a list form.
format, formatC, sprintf, and prettyNum are the main ones.
Using alarm, or printing an \a character to the console.
factor and ordered
The value is counted as missing (NA).
Use cut to bin it.
if will throw an error if you pass NA to it.
ifelse will return NA values in the corresponding positions where NA is passed to
switch will conditionally execute code based upon a character or integer argument.
Answers to Quizzes
Insert the keyword break into your loop code.
Insert the keyword next into your loop code.
lapply, vapply, sapply, apply, mapply, and tapply were all discussed in the chap‐
ter, with eapply and rapply getting brief mentions too. Try apropos("apply") to
see all of them.
All three functions accept a list and apply a function to each element in turn. The
difference is in the return value. lapply always returns a list, vapply always returns
a vector or array as specified by a template, and sapply can return either.
rapply is recursive, and ideal for deeply nested objects like trees.
This is a classic split-apply-combine problem. Use tapply (or something from the
In a name like **ply, the first asterisk denotes the type of the first input argument
and the second asterisk denotes the type of the return value.
CRAN is by far the biggest package repository. Bioconductor, R-Forge, and
RForge.net are others. There are also many packages on GitHub, Bitbucket, and
Both functions load a package, but library throws an error if it fails, whereas
require returns a logical value (letting you do custom error handling).
A package library is just a folder on your machine that contains R packages.
.libPaths returns a list of libraries.
R doesn’t do a great impression of Internet Explorer, but you can make it use Internet
Explorer’s internet2.dll library for connecting to the Internet.
| Appendix C: Answers to Quizzes
POSIXct classes must be used. Dates don’t store the time information, and POS
IXlt dates store their data as lists, which won’t fit inside a data frame.
Midnight at the start of January 1, 1970.
Add 3,600 seconds to it. For example:
x <- Sys.time()
x + 3600
##  "2013-07-17 22:44:55 BST"
The period will be longer, because 2016 is a leap year. A duration is always exactly
60 * 60 * 24 * 365 seconds. A period of one year will be 366 days in a leap year.
Call the data function with no arguments.
read.csv assumes that a decimal place is represented by a full stop (period) and
that each item is separated by a comma, whereas read.csv2 assumes that a decimal
place is represented by a comma and that each item is separated by a semicolon.
read.csv is used for data created in locales where a period is used as a decimal place
(most English-speaking locales, for example). read.csv2 is for data created in lo‐
cales where a comma is used (most European locales, for example). If you are un‐
sure, simply open your data file in a text editor.
read.xlsx2 from the xlsx package is a good first choice, but there is also read.xlsx
in the same package, and different functions in several other packages.
You can simply pass the URL to read.csv, or use download.file to get a local copy.
Currently SQLite, MySQL, PostgreSQL, and Oracle databases are supported.
Read in the text as a character vector wih readLines, call str_count to count the
number of instances in each line, and sum the total.
Answers to Quizzes
with, within, transform, and mutate all allow manipulating columns and adding
columns to data frames, as well as standard assignment.
Casting. Not freezing!
Use order or arrange.
Define a function that reads TRUE when you have a positive number—for example,
is.positive <- function(x) x > 0—and call Find(is.positive, x).
min returns the single smallest value of all its inputs. pmin accepts several vectors
that are the same length, and returns the smallest at each point along them.
Pass the pch (“plot character”) argument.
Use a formula of the form y ~ x.
An aesthetic specifies a variable that you will look at a variation in. Most plots take
an x and a y aesthetic for x and y coordinates, respectively. You can also specify
color aesthetics or shape aesthetics (where more than two variables are to be looked
at at once, for example).
Histograms, box plots, and kernel density plots were all mentioned in the chapter.
There are some other weirdly esoteric plots, like violin plots, rug plots, bean plots,
and stem-and-leaf plots, that weren’t mentioned. Have 100 geek points for each of
these that you guessed.
Set the seed (with set.seed), generate the numbers, then reset the seed to the same
PDF functions have a name beginning with d, followed by the name of the distri‐
bution. For example, the PDF for the binomial distribution is dbinom. CDF func‐
tions start with p followed by the name of the distribution, and inverse CDF func‐
tions begin with q followed by the name of the distribution.
Appendix C: Answers to Quizzes
Colons represent an interaction between variables.
anova, AIC, and BIC are common functions for comparing models.
The R^2 value is available via summary(model)$r.squared.
The warnings function shows the previous warnings.
Upon failure, try returns an object of class try-error.
The testthat equivalent of checkException is expect_exception. Say that 10
quote turns a string into a call, then eval evaluates it.
Overload functions using the S3 system. A function print.foo will be called for
objects of class foo.
DESCRIPTION and NAMESPACE are compulsory.
man and R are compulsory in all packages. src is required if you include C, C++,
or Fortran code.
CITATION files let you explain who made and maintains the package, if that in‐
formation is too long and complicated to go in the DESCRIPTION file.
roxygenise or roxygenize
First warn the user that it is deprecated by adding a call to .Deprecated. Later,
replace the body completely with a call to .Defunct.
Answers to Quizzes
Solutions to Exercises
If you get stuck, ask your system administrator, or ask on the R-help mailing list.
Use the colon operator to create a vector, and the sd function:
Type demo(plotmath) and hit Enter, or click the plots to see what’s on offer.
1. Simple division gets the reciprocal, and atan calculates the inverse (arc)
atan(1 / 1:1000)
2. Assign variables using <-:
x <- 1:1000
y <- atan(1 / x)
z <- 1 / tan(y)
For comparing two vectors that should contain the same numbers, all.equal is
usually what you need:
x == z
all.equal(x, z, tolerance = 0)
The exact values contained in the following three vectors may vary:
true_and_missing <- c(NA, TRUE, NA)
false_and_missing <- c(FALSE, FALSE, NA)
mixed <- c(TRUE, FALSE, NA)
Repeat with typeof, mode, and storage.mode.
pets <- factor(sample(
c("dog", "cat", "hamster", "goldfish"),
replace = TRUE
Converting to factors is recommended but not compulsory.
carrot <- 1
potato <- 2
swede <- 3
ls(pattern = "a")
Your vegetables may vary.
There are several possibilities for creating sequences, including the colon operator.
This solution uses seq_len and seq_along:
n <- seq_len(20)
triangular <- n * (n + 1) / 2
names(triangular) <- letters[seq_along(n)]
triangular[c("a", "e", "i", "o")]
Again, there are many different ways of creating a sequence from -11 to 0 to 11. abs
gives you the absolute value of a number:
Appendix D: Solutions to Exercises