Tải bản đầy đủ
Appendix A. Installing R and RStudio

Appendix A. Installing R and RStudio

Tải bản đầy đủ

lets you customize your installation, but the defaults will be suitable for most users. I’ve
never found a reason to change them. If your computer requires a password before
installing new progams, you’ll need it here.

Binaries Versus Source
R can be installed from precompiled binaries or built from source on any operating
system. For Windows and Mac machines, installing R from binaries is extremely easy.
The binary comes preloaded in its own installer. Although you can build R from source
on these platforms, the process is much more complicated and won’t provide much
benefit for most users. For Linux systems, the opposite is true. Precompiled binaries can
be found for some systems, but it is much more common to build R from source files
when installing on Linux. The download pages on CRAN’s website provide information
about building R from source for the Windows, Mac, and Linux platforms.

Linux
R comes preinstalled on many Linux systems, but you’ll want the newest version of R
if yours is out of date. The CRAN website provides files to build R from source on
Debian, Redhat, SUSE, and Ubuntu systems under the link “Download R for Linux.”
Click the link and then follow the directory trail to the version of Linux you wish to
install on. The exact installation procedure will vary depending on the Linux system
you use. CRAN guides the process by grouping each set of source files with documen‐
tation or README files that explain how to install on your system.

32-bit Versus 64-bit
R comes in both 32-bit and 64-bit versions. Which should you use? In most cases, it
won’t matter. Both versions use 32-bit integers, which means they compute numbers to
the same numerical precision. The difference occurs in the way each version manages
memory. 64-bit R uses 64-bit memory pointers, and 32-bit R uses 32-bit memory point‐
ers. This means 64-bit R has a larger memory space to use (and search through).
As a rule of thumb, 32-bit builds of R are faster than 64-bit builds, though not always.
On the other hand, 64-bit builds can handle larger files and data sets with fewer memory
management problems. In either version, the maximum allowable vector size tops out
at around 2 billion elements. If your operating system doesn’t support 64-bit programs,
or your RAM is less than 4 GB, 32-bit R is for you. The Windows and Mac installers will
automatically install both versions if your system supports 64-bit R.

188

|

Appendix A: Installing R and RStudio

Using R
R isn’t a program that you can open and start using, like Microsoft Word or Internet
Explorer. Instead, R is a computer language, like C, C++, or UNIX. You use R by writing
commands in the R language and asking your computer to interpret them. In the old
days, people ran R code in a UNIX terminal window—as if they were hackers in a movie
from the 1980s. Now almost everyone uses R with an application called RStudio, and I
recommend that you do, too.

R and UNIX

You can still run R in a UNIX or BASH window by typing the
command:
R

which opens an R interpreter. You can then do your work and close
the interpreter by running q() when you are finished.

RStudio
RStudio is an application like Microsoft Word—except that instead of helping you write
in English, RStudio helps you write in R. I use RStudio throughout the book because it
makes using R much easier. Also, the RStudio interface looks the same for Windows,
Mac OS, and Linux. That will help me match the book to your personal experience.
You can download RStudio for free. Just click the “Download RStudio” button and
follow the simple instructions that follow. Once you’ve installed RStudio, you can open
it like any other program on your computer—usually by clicking an icon on your
desktop.

The R GUIs

Windows and Mac users usually do not program from a terminal
window, so the Windows and Mac downloads for R come with a
simple program that opens a terminal-like window for you to run R
code in. This is what opens when you click the R icon on your Win‐
dows or Mac computer. These programs do a little more than the basic
terminal window, but not much. You may hear people refer to them
as the Windows or Mac R GUIs.

When you open RStudio, a window appears with three panes in it, as in Figure A-1. The
largest pane is a console window. This is where you’ll run your R code and see results.
The console window is exactly what you’d see if you ran R from a UNIX console or the
Windows or Mac GUIs. Everything else you see is unique to RStudio. Hidden in the
other panes are a text editor, a graphics window, a debugger, a file manager, and much
Using R

|

189

more. You’ll learn about these panes as they become useful throughout the course of
this book.

Figure A-1. The RStudio IDE for R.

Do I still need to download R?

Even if you use RStudio, you’ll still need to download R to your com‐
puter. RStudio helps you use the version of R that lives on your com‐
puter, but it doesn’t come with a version of R on its own.

Opening R
Now that you have both R and RStudio on your computer, you can begin using R by
opening the RStudio program. Open RStudio just as you would any program, by clicking
on its icon or by typing “RStudio” at the Windows Run prompt.

190

|

Appendix A: Installing R and RStudio

APPENDIX B

R Packages

Many of R’s most useful functions do not come preloaded when you start R, but reside
in packages that can be installed on top of R. R packages are similar to libraries in C, C
++, and Javascript, packages in Python, and gems in Ruby. An R package bundles to‐
gether useful functions, help files, and data sets. You can use these functions within your
own R code once you load the package they live in. Usually the contents of an R package
are all related to a single type of task, which the package helps solve. R packages will let
you take advantage of R’s most useful features: its large community of package writers
(many of whom are active data scientists) and its prewritten routines for handling many
common (and exotic) data-science tasks.

Base R

You may hear R users (or me) refer to “base R.” What is base R? It is
just the collection of R functions that gets loaded every time you start
R. These functions provide the basics of the language, and you don’t
have to load a package before you can use them.

Installing Packages
To use an R package, you must first install it on your computer and then load it in your
current R session. The easiest way to install an R package is with the install.pack
ages R function. Open R and type the following into the command line:
install.packages("package name")

This will search for the specified package in the collection of packages hosted on the
CRAN site. When R finds the package, it will download it into a libraries folder on your
computer. R can access the package here in future R sessions without reinstalling it.
Anyone can write an R package and disseminate it as they like; however, almost all R
packages are published through the CRAN website. CRAN tests each R package before
191

publishing it. This doesn’t eliminate every bug inside a package, but it does mean that
you can trust a package on CRAN to run in the current version of R on your OS.
You can install multiple packages at once by linking their names with R’s concatenate
function, c. For example, to install the ggplot2, reshape2, and dplyr packages, run:
install.packages(c("ggplot2", "reshape2", "dplyr"))

If this is your first time installing a package, R will prompt you to choose an online
mirror of to install from. Mirrors are listed by location. Your downloads should be
quickest if you select a mirror that is close to you. If you want to download a new package,
try the Austria mirror first. This is the main CRAN repository, and new packages can
sometimes take a couple of days to make it around to all of the other mirrors.

Loading Packages
Installing a package doesn’t immediately place its functions at your fingertips. It just
places them on your computer. To use an R package, you next have to load it in your R
session with the command:
library(package name)

Notice that the quotation marks have disappeared. You can use them if you like, but
quotation marks are optional for the library command. (This is not true for the in
stall.packages command).
library will make all of the package’s functions, data sets, and help files available to you
until you close your current R session. The next time you begin an R session, you’ll have
to reload the package with library if you want to use it, but you won’t have to reinstall
it. You only have to install each package once. After that, a copy of the package will live
in your R library. To see which packages you currently have in your R library, run:
library()

library() also shows the path to your actual R library, which is the folder that contains
your R packages. You may notice many packages that you don’t remember installing.
This is because R automatically downloads a set of useful packages when you first
install R.

192

|

Appendix B: R Packages

Install packages from (almost) anywhere
The devtools R package makes it easy to install packages from loca‐
tions other than the CRAN website. devtools provides functions like
install_github, install_gitorious, install_bitbucket, and in
stall_url. These work similar to install.packages, but they search
new locations for R packages. install_github is especially useful
because many R developers provide development versions of their
packages on GitHub. The development version of a package will con‐
tain a sneak peek of new functions and patches but may not be as
stable or as bug free as the CRAN version.

Why does R make you bother with installing and loading packages? You can imagine
an R where every package came preloaded, but this would be a very large and slow
program. As of May 6, 2014, the CRAN website hosts 5,511 packages. It is simpler to
only install and load the packages that you want to use when you want to use them. This
keeps your copy of R fast because it has fewer functions and help pages to search through
at any one time. The arrangement has other benefits as well. For example, it is possible
to update your copy of an R package without updating your entire copy of R.

What’s the best way to learn about R packages?

It is difficult to use an R package if you don’t know that it exists. You
could go to the CRAN website and click the Packages link to see a list
of available packages, but you’ll have to wade through thousands of
them. Moreover, many R packages do the same things.
How do you know which package does them best? The R-packages
mailing list is a place to start. It sends out announcements of new
packages and maintains an archive of old announcements. Blogs that
aggregate posts about R can also provide valuable leads. I recom‐
mend www.r-bloggers.com[R-bloggers]. RStudio maintains a list of
some of the most useful R packages in the Getting Started section of
http://support.rstudio.com. Finally, CRAN groups together some of
the most useful—and most respected—packages by subject area. This
is an excellent place to learn about the packages designed for your
area of work.

Loading Packages

|

193

APPENDIX C

Updating R and Its Packages

The R Core Development Team continuously hones the R language by catching bugs,
improving performance, and updating R to work with new technologies. As a result,
new versions of R are released several times a year. The easiest way to stay current with
R is to periodically check the CRAN website. The website is updated for each new release
and makes the release available for download. You’ll have to install the new release. The
process is the same as when you first installed R.
Don’t worry if you’re not interested in staying up-to-date on R Core’s doings. R changes
only slightly between releases, and you’re not likely to notice the differences. However,
updating to the current version of R is a good place to start if you ever encounter a bug
that you can’t explain.
RStudio also constantly improves its product. You can acquire the newest updates just
by downloading them from RStudio.

R Packages
Package authors occasionally release new versions of their packages to add functions,
fix bugs, or improve performance. The update.packages command checks whether
you have the most current version of a package and installs the most current version if
you do not. The syntax for update.packages follows that of install.packages. If you
already have ggplot2, reshape2, and dplyr on your computer, it’d be a good idea to check
for updates before you use them:
update.packages(c("ggplot2", "reshape2", "dplyr"))

You should start a new R session after updating packages. If you have a package loaded
when you update it, you’ll have to close your R session and open a new one to begin
using the updated version of the package.

195