Tải bản đầy đủ
Chapter 5. Lists and Data Frames

Chapter 5. Lists and Data Frames

Tải bản đầy đủ

(a_list <- list(
c(1, 1, 2, 5, 14, 42),
#See http://oeis.org/A000108
month.abb,
matrix(c(3, -8, 1, -3), nrow = 2),
asin
))
##
##
##
##
##
##
##
##
##
##
##
##
##
##

[[1]]
[1] 1

1

2

5 14 42

[[2]]
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"
[[3]]
[1,]
[2,]

[,1] [,2]
3
1
-8
-3

[[4]]
function (x)

.Primitive("asin")

As with vectors, you can name elements during construction, or afterward using the

names function:

names(a_list) <- c("catalan", "months", "involutary", "arcsin")
a_list
##
##
##
##
##
##
##
##
##
##
##
##
##
##

$catalan
[1] 1 1

2

$months
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"
$involutary
[,1] [,2]
[1,]
3
1
[2,] -8
-3
$arcsin
function (x)

(the_same_list
catalan
=
months
=
involutary =
arcsin
=
))
## $catalan
## [1] 1 1
##
## $months

58

|

5 14 42

.Primitive("asin")
<- list(
c(1, 1, 2, 5, 14, 42),
month.abb,
matrix(c(3, -8, 1, -3), nrow = 2),
asin

2

5 14 42

Chapter 5: Lists and Data Frames

##
##
##
##
##
##
##
##
##
##

[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
[12] "Dec"
$involutary
[,1] [,2]
[1,]
3
1
[2,]
-8
-3
$arcsin
function (x)

.Primitive("asin")

It isn’t compulsory, but it helps if the names that you give elements are valid variable
names.
It is even possible for elements of lists to be lists themselves:
(main_list <- list(
middle_list
= list(
element_in_middle_list = diag(3),
inner_list
= list(
element_in_inner_list
= pi ^ 1:4,
another_element_in_inner_list = "a"
)
),
element_in_main_list = log10(1:10)
))
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

$middle_list
$middle_list$element_in_middle_list
[,1] [,2] [,3]
[1,]
1
0
0
[2,]
0
1
0
[3,]
0
0
1
$middle_list$inner_list
$middle_list$inner_list$element_in_inner_list
[1] 3.142
$middle_list$inner_list$another_element_in_inner_list
[1] "a"

$element_in_main_list
[1] 0.0000 0.3010 0.4771 0.6021 0.6990 0.7782 0.8451 0.9031 0.9542 1.0000

In theory, you can keep nesting lists forever. In practice, current versions of R will throw
an error once you start nesting your lists tens of thousands of levels deep (the exact
number is machine specific). Luckily, this shouldn’t be a problem for you, since realworld code where nesting is deeper than three or four levels is extremely rare.

Lists

|

59

Atomic and Recursive Variables
Due to this ability to contain other lists within themselves, lists are considered to be
recursive variables. Vectors, matrices, and arrays, by contrast, are atomic. (Variables can
either be recursive or atomic, never both; Appendix A contains a table explaining which
variable types are atomic, and which are recursive.) The functions is.recursive and
is.atomic let us test variables to see what type they are:
is.atomic(list())
## [1] FALSE
is.recursive(list())
## [1] TRUE
is.atomic(numeric())
## [1] TRUE
is.recursive(numeric())
## [1] FALSE

List Dimensions and Arithmetic
Like vectors, lists have a length. A list’s length is the number of top-level elements that
it contains:
length(a_list)
## [1] 4
length(main_list) #doesn't include the lengths of nested lists
## [1] 2

Again, like vectors, but unlike matrices, lists don’t have dimensions. The dim function
correspondingly returns NULL:
dim(a_list)
## NULL

nrow, NROW, and the corresponding column functions work on lists in the same way as
on vectors:
nrow(a_list)
## NULL
ncol(a_list)
## NULL
NROW(a_list)
## [1] 4

60

|

Chapter 5: Lists and Data Frames

NCOL(a_list)
## [1] 1

Unlike with vectors, arithmetic doesn’t work on lists. Since each element can be of a
different type, it doesn’t make sense to be able to add or multiply two lists together. It is
possible to do arithmetic on list elements, however, assuming that they are of an ap‐
propriate type. In that case, the usual rules for the element contents apply. For example:
l1 <- list(1:5)
l2 <- list(6:10)
l1[[1]] + l2[[1]]
## [1]

7

9 11 13 15

More commonly, you might want to perform arithmetic (or some other operation) on
every element of a list. This requires looping, and will be discussed in Chapter 8.

Indexing Lists
Consider this test list:
l <- list(
first = 1,
second = 2,
third = list(
alpha = 3.1,
beta = 3.2
)
)

As with vectors, we can access elements of the list using square brackets, [], and positive
or negative numeric indices, element names, or a logical index. The following four lines
of code all give the same result:
l[1:2]
##
##
##
##
##

$first
[1] 1
$second
[1] 2

l[-3]
##
##
##
##
##

$first
[1] 1
$second
[1] 2

Lists

|

61

l[c("first", "second")]
##
##
##
##
##

$first
[1] 1
$second
[1] 2

l[c(TRUE, TRUE, FALSE)]
##
##
##
##
##

$first
[1] 1
$second
[1] 2

The result of these indexing operations is another list. Sometimes we want to access the
contents of the list elements instead. There are two operators to help us do this. Double
square brackets ([[]]) can be given a single positive integer denoting the index to return,
or a single string naming that element:
l[[1]]
## [1] 1
l[["first"]]
## [1] 1

The is.list function returns TRUE if the input is a list, and FALSE otherwise. For com‐
parison, take a look at the two indexing operators:
is.list(l[1])
## [1] TRUE
is.list(l[[1]])
## [1] FALSE

For named elements of lists, we can also use the dollar sign operator, $. This works
almost the same way as passing a named string to the double square brackets, but has
two advantages. Firstly, many IDEs will autocomplete the name for you. (In R GUI,
press Tab for this feature.) Secondly, R accepts partial matches of element names:
l$first
## [1] 1
l$f

#partial matching interprets "f" as "first"

## [1] 1

To access nested elements, we can stack up the square brackets or pass in a vector, though
the latter method is less common and usually harder to read:

62

| Chapter 5: Lists and Data Frames

l[["third"]]["beta"]
## $beta
## [1] 3.2
l[["third"]][["beta"]]
## [1] 3.2
l[[c("third", "beta")]]
## [1] 3.2

The behavior when you try to access a nonexistent element of a list varies depending
upon the type of indexing that you have used. For the next example, recall that our list,
l, has only three elements.
If we use single square-bracket indexing, then the resulting list has an element with the
value NULL (and name NA, if the original list has names). Compare this to bad indexing
of a vector where the return value is NA:
l[c(4, 2, 5)]
##
##
##
##
##
##
##
##

$
NULL
$second
[1] 2
$
NULL

l[c("fourth", "second", "fifth")]
##
##
##
##
##
##
##
##

$
NULL
$second
[1] 2
$
NULL

Trying to access the contents of an element with an incorrect name, either with double
square brackets or a dollar sign, returns NULL:
l[["fourth"]]
## NULL
l$fourth
## NULL

Lists

|

63

Finally, trying to access the contents of an element with an incorrect numerical index
throws an error, stating that the subscript is out of bounds. This inconsistency in be‐
havior is something that you just need to accept, though the best defense is to make sure
that you check your indices before you use them:
l[[4]]

#this throws an error

Converting Between Vectors and Lists
Vectors can be converted to lists using the function as.list. This creates a list with each
element of the vector mapping to a list element containing one value:
busy_beaver <- c(1, 6, 21, 107)
as.list(busy_beaver)
##
##
##
##
##
##
##
##
##
##
##

#See http://oeis.org/A060843

[[1]]
[1] 1
[[2]]
[1] 6
[[3]]
[1] 21
[[4]]
[1] 107

If each element of the list contains a scalar value, then it is also possible to convert that
list to a vector using the functions that we have already seen (as.numeric, as.charac
ter, and so on):
as.numeric(list(1, 6, 21, 107))
## [1]

1

6

21 107

This technique won’t work in cases where the list contains nonscalar elements. This is
a real issue, because as well as storing different types of data, lists are very useful for
storing data of the same type, but with a nonrectangular shape:
(prime_factors
two
= 2,
three = 3,
four = c(2,
five = 5,
six
= c(2,
seven = 7,
eight = c(2,
nine = c(3,
ten
= c(2,
))

64

|

<- list(
2),
3),
2, 2),
3),
5)

Chapter 5: Lists and Data Frames

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

$two
[1] 2
$three
[1] 3
$four
[1] 2 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7
$eight
[1] 2 2 2
$nine
[1] 3 3
$ten
[1] 2 5

This sort of list can be converted to a vector using the function unlist (it is sometimes
technically possible to do this with mixed-type lists, but rarely useful):
unlist(prime_factors)
##
two three four1 four2
##
2
3
2
2
## eight3 nine1 nine2
ten1
##
2
3
3
2

five
5
ten2
5

six1
2

six2 seven eight1 eight2
3
7
2
2

Combining Lists
The c function that we have used for concatenating vectors also works for concatenating
lists:
c(list(a = 1, b = 2), list(3))
##
##
##
##
##
##
##
##

$a
[1] 1
$b
[1] 2
[[3]]
[1] 3

Lists

|

65

If we use it to concatenate lists and vectors, the vectors are converted to lists (as though
as.list had been called on them) before the concatenation occurs:
c(list(a = 1, b = 2), 3)
##
##
##
##
##
##
##
##

$a
[1] 1
$b
[1] 2
[[3]]
[1] 3

It is also possible to use the cbind and rbind functions on lists, but the resulting objects
are very strange indeed. They are matrices with possibly nonscalar elements, or lists
with dimensions, depending upon which way you want to look at them:
(matrix_list_hybrid <- cbind(
list(a = 1, b = 2),
list(c = 3, list(d = 4))
))
##
[,1] [,2]
## a 1
3
## b 2
List,1
str(matrix_list_hybrid)
## List of 4
## $ : num 1
## $ : num 2
## $ : num 3
## $ :List of 1
##
..$ d: num 4
## - attr(*, "dim")= int [1:2] 2 2
## - attr(*, "dimnames")=List of 2
##
..$ : chr [1:2] "a" "b"
##
..$ : NULL

Using cbind and rbind in this way is something you shouldn’t do often, and probably
not at all. It’s another case of R being a little too flexible and accommodating, instead of
telling you that you’ve done something silly by throwing an error.

NULL
NULL is a special value that represents an empty variable. Its most common use is in lists,

but it also crops up with data frames and function arguments. These other uses will be
discussed later.

66

|

Chapter 5: Lists and Data Frames

When you create a list, you may wish to specify that an element should exist, but should
have no contents. For example, the following list contains UK bank holidays1 for 2013
by month. Some months have no bank holidays, so we use NULL to represent this absence:
(uk_bank_holidays_2013 <- list(
Jan = "New Year's Day",
Feb = NULL,
Mar = "Good Friday",
Apr = "Easter Monday",
May = c("Early May Bank Holiday", "Spring Bank Holiday"),
Jun = NULL,
Jul = NULL,
Aug = "Summer Bank Holiday",
Sep = NULL,
Oct = NULL,
Nov = NULL,
Dec = c("Christmas Day", "Boxing Day")
))
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

$Jan
[1] "New Year's Day"
$Feb
NULL
$Mar
[1] "Good Friday"
$Apr
[1] "Easter Monday"
$May
[1] "Early May Bank Holiday" "Spring Bank Holiday"
$Jun
NULL
$Jul
NULL
$Aug
[1] "Summer Bank Holiday"
$Sep
NULL
$Oct
NULL

1. Bank holidays are public holidays.

NULL

|

67

##
##
##
##
##

$Nov
NULL
$Dec
[1] "Christmas Day" "Boxing Day"

It is important to understand the difference between NULL and the special missing value
NA. The biggest difference is that NA is a scalar value, whereas NULL takes up no space at
all—it has length zero:
length(NULL)
## [1] 0
length(NA)
## [1] 1

You can test for NULL using the function is.null. Missing values are not null:
is.null(NULL)
## [1] TRUE
is.null(NA)
## [1] FALSE

The converse test doesn’t really make much sense. Since NULL has length zero, we have
nothing to test to see if it is missing:
is.na(NULL)
## Warning: is.na() applied to non-(list or vector) of type 'NULL'
## logical(0)

NULL can also be used to remove elements of a list. Setting an element to NULL (even if
it already contains NULL) will remove it. Suppose that for some reason we want to switch

to an old-style Roman 10-month calendar, removing January and February:
uk_bank_holidays_2013$Jan <- NULL
uk_bank_holidays_2013$Feb <- NULL
uk_bank_holidays_2013
##
##
##
##
##
##
##
##
##
##
##
##

68

$Mar
[1] "Good Friday"
$Apr
[1] "Easter Monday"
$May
[1] "Early May Bank Holiday" "Spring Bank Holiday"
$Jun
NULL

| Chapter 5: Lists and Data Frames