Tải bản đầy đủ
Appendix B. Other Things to Do in R

Appendix B. Other Things to Do in R

Tải bản đầy đủ

The ff and bigmemory packages allow you to store R variables in files (like SAS does),
avoiding the limitations of RAM. On a related note, the data.table package provides
an enhanced data frame variable type that has faster indexing, assignment, and merging
capabilities.
There are dozens of spatial statistics packages: sp provides a standard way of storing
spatial data objects; maps, maptools, and mapproj provide helper functions for reading
and writing maps; spatstat provides functions for (you guessed it) spatial statistics;
OpenStreetMap retrieves raster images from http://openstreetmap.com; and so on.
Finally, R has a tool for combining code and output with regular text in a report (some‐
times called literate programming or reproducible research), called Sweave. This has been
improved upon and extended by the knitr package, which allows you to create reports
using a variety of markup languages. In fact, this book was written using knitr to create
AsciiDoc markup, which in turn was used to create PDF, HTML, and ebook documents.
Yihui Xie’s Dynamic Documents with R and knitr explains how to use it.
The R ecosystem is big, and getting bigger all the time. These are just a few of the things
that you might want to explore; take a look in the bibliography for some more ideas.
Have fun exploring!

332

| Appendix B: Other Things to Do in R

APPENDIX C

Answers to Quizzes

Question 1-1
R is an open source reworking of the S programming language.
Question 1-2
Choices include imperative, object-oriented, and functional.
Question 1-3
8:27
Question 1-4
help.search (which does the same as ??)
Question 1-5
RSiteSearch
Question 2-1
%/%
Question 2-2
all.equal(x, pi) or, even better, isTRUE(all.equal(x, pi))
Question 2-3
At least two of the following:
1. <2. + =
3. + <<4. assign
Question 2-4
Just Inf and -Inf
333

Question 2-5
0, Inf, and -Inf
Question 3-1
numeric, integer, and complex
Question 3-2
nlevels
Question 3-3
as.numeric("6.283185")
Question 3-4
Any three of summary, head, str, unclass, attributes, or View. Bonus points if
you’ve discovered tail, the counterpart to head that prints the last few rows.
Question 3-5
rm(list = ls())
Question 4-1
seq.int(0, 1, 0.25)
Question 4-2
Either by using name = value pairs when the vector is created, or by calling the
names function afterward.
Question 4-3
Positive integers for locations to retrieve, negative integers for locations to avoid,
logical values, or names of elements.
Question 4-4
3 * 4 * 5 = 60
Question 4-5
%*%
Question 5-1
3. The inner list counts as one element, and so does the NULL element.
Question 5-2
When passing arguments to functions, when calling formals, or in the global en‐
vironment variable .Options.
Question 5-3
You can use matrix-style indexing with pairs of positive integers/negative integers/
logical values/characters in single square brackets. You can also use list-style

334

|

Appendix C: Answers to Quizzes

indexing with one index value inside single or double square brackets, or the dollar
sign ($) operator. Thirdly, you can call the subset function.
Question 5-4
By passing check.names = FALSE to data.frame.
Question 5-5
rbind for appending vertically or cbind for appending horizontally.
Question 6-1
The user workspace.
Question 6-2
list2env is the best solution, but as.environment also works.
Question 6-3
Just type its name.
Question 6-4
formals, args, and formalArgs.
Question 6-5
do.call calls a function with its arguments in a list form.
Question 7-1
format, formatC, sprintf, and prettyNum are the main ones.
Question 7-2
Using alarm, or printing an \a character to the console.
Question 7-3
factor and ordered
Question 7-4
The value is counted as missing (NA).
Question 7-5
Use cut to bin it.
Question 8-1
if will throw an error if you pass NA to it.
Question 8-2
ifelse will return NA values in the corresponding positions where NA is passed to
it.
Question 8-3
switch will conditionally execute code based upon a character or integer argument.

Answers to Quizzes

|

335

Question 8-4
Insert the keyword break into your loop code.
Question 8-5
Insert the keyword next into your loop code.
Question 9-1
lapply, vapply, sapply, apply, mapply, and tapply were all discussed in the chap‐
ter, with eapply and rapply getting brief mentions too. Try apropos("apply") to
see all of them.
Question 9-2
All three functions accept a list and apply a function to each element in turn. The
difference is in the return value. lapply always returns a list, vapply always returns
a vector or array as specified by a template, and sapply can return either.
Question 9-3
rapply is recursive, and ideal for deeply nested objects like trees.
Question 9-4
This is a classic split-apply-combine problem. Use tapply (or something from the
plyr package).
Question 9-5
In a name like **ply, the first asterisk denotes the type of the first input argument
and the second asterisk denotes the type of the return value.
Question 10-1
CRAN is by far the biggest package repository. Bioconductor, R-Forge, and
RForge.net are others. There are also many packages on GitHub, Bitbucket, and
Google Code.
Question 10-2
Both functions load a package, but library throws an error if it fails, whereas
require returns a logical value (letting you do custom error handling).
Question 10-3
A package library is just a folder on your machine that contains R packages.
Question 10-4
.libPaths returns a list of libraries.
Question 10-5
R doesn’t do a great impression of Internet Explorer, but you can make it use Internet
Explorer’s internet2.dll library for connecting to the Internet.

336

| Appendix C: Answers to Quizzes

Question 11-1
POSIXct classes must be used. Dates don’t store the time information, and POS
IXlt dates store their data as lists, which won’t fit inside a data frame.
Question 11-2
Midnight at the start of January 1, 1970.
Question 11-3
"%B %Y"
Question 11-4
Add 3,600 seconds to it. For example:
x <- Sys.time()
x + 3600
## [1] "2013-07-17 22:44:55 BST"

Question 11-5
The period will be longer, because 2016 is a leap year. A duration is always exactly
60 * 60 * 24 * 365 seconds. A period of one year will be 366 days in a leap year.
Question 12-1
Call the data function with no arguments.
Question 12-2
read.csv assumes that a decimal place is represented by a full stop (period) and
that each item is separated by a comma, whereas read.csv2 assumes that a decimal
place is represented by a comma and that each item is separated by a semicolon.
read.csv is used for data created in locales where a period is used as a decimal place
(most English-speaking locales, for example). read.csv2 is for data created in lo‐
cales where a comma is used (most European locales, for example). If you are un‐
sure, simply open your data file in a text editor.
Question 12-3
read.xlsx2 from the xlsx package is a good first choice, but there is also read.xlsx
in the same package, and different functions in several other packages.
Question 12-4
You can simply pass the URL to read.csv, or use download.file to get a local copy.
Question 12-5
Currently SQLite, MySQL, PostgreSQL, and Oracle databases are supported.
Question 13-1
Read in the text as a character vector wih readLines, call str_count to count the
number of instances in each line, and sum the total.

Answers to Quizzes

|

337

Question 13-2
with, within, transform, and mutate all allow manipulating columns and adding
columns to data frames, as well as standard assignment.
Question 13-3
Casting. Not freezing!
Question 13-4
Use order or arrange.
Question 13-5
Define a function that reads TRUE when you have a positive number—for example,
is.positive <- function(x) x > 0—and call Find(is.positive, x).
Question 14-1
min returns the single smallest value of all its inputs. pmin accepts several vectors
that are the same length, and returns the smallest at each point along them.
Question 14-2
Pass the pch (“plot character”) argument.
Question 14-3
Use a formula of the form y ~ x.
Question 14-4
An aesthetic specifies a variable that you will look at a variation in. Most plots take
an x and a y aesthetic for x and y coordinates, respectively. You can also specify
color aesthetics or shape aesthetics (where more than two variables are to be looked
at at once, for example).
Question 14-5
Histograms, box plots, and kernel density plots were all mentioned in the chapter.
There are some other weirdly esoteric plots, like violin plots, rug plots, bean plots,
and stem-and-leaf plots, that weren’t mentioned. Have 100 geek points for each of
these that you guessed.
Question 15-1
Set the seed (with set.seed), generate the numbers, then reset the seed to the same
value.
Question 15-2
PDF functions have a name beginning with d, followed by the name of the distri‐
bution. For example, the PDF for the binomial distribution is dbinom. CDF func‐
tions start with p followed by the name of the distribution, and inverse CDF func‐
tions begin with q followed by the name of the distribution.

338

|

Appendix C: Answers to Quizzes

Question 15-3
Colons represent an interaction between variables.
Question 15-4
anova, AIC, and BIC are common functions for comparing models.
Question 15-5
The R^2 value is available via summary(model)$r.squared.
Question 16-1
The warnings function shows the previous warnings.
Question 16-2
Upon failure, try returns an object of class try-error.
Question 16-3
The testthat equivalent of checkException is expect_exception. Say that 10
times fast.
Question 16-4
quote turns a string into a call, then eval evaluates it.
Question 16-5
Overload functions using the S3 system. A function print.foo will be called for
objects of class foo.
Question 17-1
DESCRIPTION and NAMESPACE are compulsory.
Question 17-2
man and R are compulsory in all packages. src is required if you include C, C++,
or Fortran code.
Question 17-3
CITATION files let you explain who made and maintains the package, if that in‐
formation is too long and complicated to go in the DESCRIPTION file.
Question 17-4
roxygenise or roxygenize
Question 17-5
First warn the user that it is deprecated by adding a call to .Deprecated. Later,
replace the body completely with a call to .Defunct.

Answers to Quizzes

|

339

APPENDIX D

Solutions to Exercises

Exercise 1-1
If you get stuck, ask your system administrator, or ask on the R-help mailing list.
Exercise 1-2
Use the colon operator to create a vector, and the sd function:
sd(0:100)

Exercise 1-3
Type demo(plotmath) and hit Enter, or click the plots to see what’s on offer.
Exercise 2-1
1. Simple division gets the reciprocal, and atan calculates the inverse (arc)
tangent:
atan(1 / 1:1000)

2. Assign variables using <-:
x <- 1:1000
y <- atan(1 / x)
z <- 1 / tan(y)

Exercise 2-2
For comparing two vectors that should contain the same numbers, all.equal is
usually what you need:
x == z
identical(x, z)
all.equal(x, z)
all.equal(x, z, tolerance = 0)

341

Exercise 2-3
The exact values contained in the following three vectors may vary:
true_and_missing <- c(NA, TRUE, NA)
false_and_missing <- c(FALSE, FALSE, NA)
mixed <- c(TRUE, FALSE, NA)
any(true_and_missing)
any(false_and_missing)
any(mixed)
all(true_and_missing)
all(false_and_missing)
all(mixed)

Exercise 3-1
class(Inf)
class(NA)
class(NaN)
class("")

Repeat with typeof, mode, and storage.mode.
Exercise 3-2
pets <- factor(sample(
c("dog", "cat", "hamster", "goldfish"),
1000,
replace = TRUE
))
head(pets)
summary(pets)

Converting to factors is recommended but not compulsory.
Exercise 3-3
carrot <- 1
potato <- 2
swede <- 3
ls(pattern = "a")

Your vegetables may vary.
Exercise 4-1
There are several possibilities for creating sequences, including the colon operator.
This solution uses seq_len and seq_along:
n <- seq_len(20)
triangular <- n * (n + 1) / 2
names(triangular) <- letters[seq_along(n)]
triangular[c("a", "e", "i", "o")]

Exercise 4-2
Again, there are many different ways of creating a sequence from -11 to 0 to 11. abs
gives you the absolute value of a number:
342

|

Appendix D: Solutions to Exercises