Tải bản đầy đủ - 0 (trang)
11 Matrices, lists and data frames

# 11 Matrices, lists and data frames

Tải bản đầy đủ - 0trang

17

INTRODUCTION TO R

> rbind(X, Y)

[,1] [,2] [,3] [,4] [,5]

X 16.92 24.03 7.61 15.49 11.77

Y 8.37 12.93 16.65 12.20 13.12

Row and column names can be set (and viewed) using the rownames() and colnames() functions:

> colnames(XY)

[1] "X" "Y"

> rownames(XY) <- LETTERS[1:5]

> XY

X

Y

A 16.92 8.37

B 24.03 12.93

C 7.61 16.65

D 15.49 12.20

E 11.77 13.12

The object, LETTERS, is a 26 character vector inbuilt into R that contains the

uppercase letters of the English alphabet. Similarly, letters, contains the equivalent

lowercase letters.

1.11.2 Lists

Whilst matrices store vectors of the same type (class) and length, lists are used to store

collections of objects that can be of differing lengths and types. Lists are constructed

using the list() function. For example, we have previously created a number of

isolated vectors (temperature, shade and names and coordinates of sites) that may

actually represent data or information from a single experiment. These objects can be

grouped together such that they all become components of a list object:

> EXPERIMENT <- list(SITE = SITE, COORDINATES = paste(X,

+

Y, sep = ","), TEMPERATURE = TEMPERATURE,

+

> EXPERIMENT

\$SITE

[1] "A1" "A2" "B1" "B2" "C1" "C2" "D1" "D2" "E1" "E2"

\$COORDINATES

[1] "16.92,8.37" "24.03,12.93" "7.61,16.65"

[5] "11.77,13.12"

\$TEMPERATURE

Q1

Q2

Q3

Q4

Q5

36.1 30.6 31.0 36.3 39.9

Q6

Q7

Q8

6.5 11.2 12.8

"15.49,12.2"

Q9 Q10

9.7 15.9

18

CHAPTER 1

[1] no

full no

Levels: no full

full no

full no

full no

full

Note that this list consists of four components made up of two character vectors

(SITE and COORDINATES: a vector of XY coordinates for sites A, B, C, D and E), a

numeric vector (TEMPERATURE) and a factor (SHADE). Note also that while three of

the components have a length of 10, the COORDINATES component has only ﬁve.

1.11.3 Data frames - data sets

Rarely are single biological variables collected in isolation. Rather, data are usually

collected in sets of variables reﬂecting investigations of patterns between and/or among

the different variables. Consequently, data sets are best organized into matricies of

variables (vectors) all of the same lengths yet not necessarily of the same type. Hence,

neither lists nor matrices represent natural storages for data sets. This is the role of

data frames which are used to store a list of vectors of the same length (yet potentially

different types) in a rectangular matrix.

Data frames are generated by combining multiple vectors together such that each

vector becomes a separate column in the data frame. In this way, a data frame is similar

to a matrix in which each column can represent a different vector type. For a data

frame to faithfully represent a data set, the sequence in which observations appear in

the vectors must be the same for each vector, and each vector should have the same

number of observations. For example, the ﬁrst, second, third...etc entries in each vector

must represent respectively, the observations collected from the ﬁrst, second, third...etc

sampling units.

Since the focus of this book is in the exploration, analysis and summary of data sets,

and data sets are accommodated in R by data frames, the generation, importation/

exportation, manipulation and management of data frames receives extensive coverage

in chapter 2.

1.12

Object information and conversion

1.12.1 Object information

Everything in R is an object and all objects are of a certain type or class. The class of an

object can be examined using the class() function. For example:

> class(TEMPERATURE)

[1] "numeric"

There is also a family of functions preﬁxed with is. that evaluate whether or not an

object is of a particular class (or type) or not. Table 1.3 lists the common object query

functions. All object query functions return a logical vector. Enter methods(is) for a

more comprehensive list.

INTRODUCTION TO R

19

Table 1.3 Common object query functions and their corresponding return values.

Function

Returns TRUE:

is.numeric(x)

is.null(x)

is.logical(x)

is.character(x)

if all elements of x are numeric or integer (x <-c(1,-3.5))

if x is NULL (the object has no length) (x <-NULL)

if all elements of x are logical (x <- c(TRUE,FALSE))

if all elements of x are character strings

if the object x is a vector (a single dimension). Returns FALSE if

object has any attributes other than names

if the object x is a factor

if the object x is a matrix (2 dimensions but not a data frame)

if the object x is a list

if the object x is a data frame

for each missing (NA) element in x (x <- c(NA,2))

(‘not’) character as a preﬁx converts the above functions into

‘is.not.’

is.vector(x)

is.factor(x)

is.matrix(x)

is.list(x)

is.data.frame(x)

is.na(x)

!

Many R objects also have a set of attributes, the number and type of which are

speciﬁc to each class of object. For example, a matrix object has a speciﬁc number

of dimensions as well as row and column names. The attributes of an object can be

viewed using the attributes() function:

> attributes(XY)

\$dim

[1] 5 2

\$dimnames

\$dimnames[[1]]

[1] "A" "B" "C" "D" "E"

\$dimnames[[2]]

[1] "X" "Y"

Similarly, the attr() function can be used to view and set individual attributes of

an object, by specifying the name of the object and the name of the attribute (as a

character string) as arguments. For example:

> attr(XY, "dim")

[1] 5 2

> attr(XY, "description") <- "coordinates of quadrats"

> XY

X

Y

A 16.92 8.37

B 24.03 12.93

20

CHAPTER 1

C 7.61 16.65

D 15.49 12.20

E 11.77 13.12

attr(,"description")

Note that in the above example, the attribute "description" is not a inbuilt attribute

of a matrix. When a new attribute is set, this attribute is displayed along with the object.

This provides a useful way of attaching a description to an object, thereby reducing the

risks of the object becoming unfamiliar.

1.12.2 Object conversion

Objects can be converted or coerced into other objects using a family of functions

with a as. preﬁx. Note that there are some obvious restrictions on these conversions

as most objects cannot be completely accommodated by all other object types, and

therefore some information (such as certain attributes) may be lost or modiﬁed during

the conversion. Objects and elements that cannot be successfully coerced are returned

as NA. Table 1.4 lists the common object coercion functions. Use methods(as) for a

more comprehensive list.

Table 1.4 Common object coercion functions and their corresponding return values.

Function

Converts object to

as.numeric(x)

as.null(x)

as.logical(x)

as.character(x)

as.vector(x)

as.factor(x)

as.matrix(x)

a numeric vector (‘integer’ or ‘real’). Factors converted to integers.

a NULL

a logical vector. Values of >1 converted to TRUE, otherwise FALSE

a character vector

a vector. All attributes (including names) are removed.

a factor. This is an abbreviated version of factor

a matrix. Any non-numeric elements result in all matrix elements

being converted to character strings

a list

a data frame. Matrix columns and list columns are converted into a

separate vectors of the data frame, and character vectors are

converted into factors. All previous attributes are removed

as.list(x)

as.data.frame(x)

1.13

Indexing vectors, matrices and lists

This section makes use of a number of objects created in earlier sections. Importantly, the TEMPERATURE object is a named vector and thus output will differ

slightly from unnamed vectors in that returned elements are headed by their row

names.

21

INTRODUCTION TO R

1.13.1 Vector indexing

It is possible to print or refer to a subset of a vector by appending an index vector

(enclosed in square brackets, []), to the vector name. There are four common forms

of vector indexing used to extract a sub-set of vectors:

(i) Vector of positive integers. A set of integers that indicate which elements of the

vector are to be selected. Selected elements are concatenated in the speciﬁed order.

– Select the nth element

> TEMPERATURE[2]

Q2

30.6

– Select elements n through m

> TEMPERATURE[2:5]

Q2

Q3

Q4

Q5

30.6 31.0 36.3 39.9

– Select a speciﬁc set of elements

> TEMPERATURE[c(1, 5, 6, 9)]

Q1

Q5

Q6

Q9

36.1 39.9 6.5 9.7

(ii) Vector of negative integers. A set of integers that indicate which elements of the

vector are to be excluded from concatenation.

– Select all but the nth element

> TEMPERATURE[-2]

Q1

Q3

Q4

Q5

36.1 31.0 36.3 39.9

Q6

Q7

Q8

6.5 11.2 12.8

Q9 Q10

9.7 15.9

(iii) Vector of character strings. This form of vector indexing is only possible for vectors

whose elements have been named. A vector of element names can be used to select

elements for concatenation.

– Select the named element

> TEMPERATURE["Q1"]

Q1

36.1

– Select the names elements

> TEMPERATURE[c("Q1", "Q4")]

Q1

Q4

36.1 36.3

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

11 Matrices, lists and data frames

Tải bản đầy đủ ngay(0 tr)

×