Tải bản đầy đủ
4 Using Rprof() to Find Slow Spots in Your Code

4 Using Rprof() to Find Slow Spots in Your Code

Tải bản đầy đủ

argument to stop the monitoring. Finally, we’ll call summaryRprof() to see the
results.
> x <- runif(1000000)
> Rprof()
> invisible(powers1(x,8))
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"cbind"
0.74
86.0
0.74
86.0
"*"
0.10
11.6
0.10
11.6
"matrix"
0.02
2.3
0.02
2.3
"powers1"
0.00
0.0
0.86
100.0
$by.total
"powers1"
"cbind"
"*"
"matrix"

total.time total.pct self.time self.pct
0.86
100.0
0.00
0.0
0.74
86.0
0.74
86.0
0.10
11.6
0.10
11.6
0.02
2.3
0.02
2.3

$sampling.time
[1] 0.86

We see immediately that the runtime of our code is dominated by calls
to cbind(), which as we noted in the extended example is indeed slowing
things down.
By the way, the call to invisible() in this example is used to suppress output. We certainly don’t want to see the 1,000,000-row matrix returned by
powers1() here!
Profiling powers2() does not show any obvious bottlenecks.
> Rprof()
> invisible(powers2(x,8))
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"powers2"
0.38
67.9
0.56
100.0
"matrix"
0.14
25.0
0.14
25.0
"*"
0.04
7.1
0.04
7.1
$by.total
"powers2"
"matrix"
"*"

total.time total.pct self.time self.pct
0.56
100.0
0.38
67.9
0.14
25.0
0.14
25.0
0.04
7.1
0.04
7.1
Performance Enhancement: Speed and Memory

317

$sampling.time
[1] 0.56

What about powers3(), the promising approach that didn’t pan out?
> Rprof()
> invisible(powers3(x,8))
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"FUN"
0.94
56.6
0.94
56.6
"outer"
0.72
43.4
1.66
100.0
"powers3"
0.00
0.0
1.66
100.0
$by.total
"outer"
"powers3"
"FUN"

total.time total.pct self.time self.pct
1.66
100.0
0.72
43.4
1.66
100.0
0.00
0.0
0.94
56.6
0.94
56.6

$sampling.time
[1] 1.66

The function logging the largest amount of time was FUN(), which as
noted in our extended example is simply multiplication. For each pair of
elements of x here, one of the elements is multiplied by the other; that is, a
product of two scalars is found. In other words, no vectorization! No wonder
it was slow.

14.4.2

How Rprof() Works

Let’s explore in a bit more detail what Rprof() does. Every 0.02 seconds (the
default value), R inspects the call stack to determine which function calls
are in effect at that time. It writes the result of each inspection to a file, by
default Rprof.out. Here is an excerpt of that file from our run of powers3():
...
"outer" "powers3"
"outer" "powers3"
"outer" "powers3"
"FUN" "outer" "powers3"
"FUN" "outer" "powers3"
"FUN" "outer" "powers3"
"FUN" "outer" "powers3"
...

318

Chapter 14

So, Rprof() often found that at inspection time, powers3() had called
outer(), which in turn had called FUN(), the latter being the currently executing function. The function summaryRprof() conveniently summarizes all those
lines in the file, but you may find that looking at the file itself reveals more
insights in some cases.
Note, too, that Rprof() is no panacea. If the code you’re profiling produces many function calls (including indirect calls, triggered when your
code calls some function that then calls another within R), the profiling output may be hard to decipher. This is arguably the case for the output from
powers4():
$by.self
"apply"
"lapply"
"FUN"
"as.vector"
"t.default"
"unlist"
"!"
"is.null"
"aperm"
"matrix"
"!="
"powers4"
"t"
"array"

self.time self.pct total.time total.pct
19.46
67.5
27.56
95.6
4.02
13.9
5.68
19.7
2.56
8.9
2.56
8.9
0.82
2.8
0.82
2.8
0.54
1.9
0.54
1.9
0.40
1.4
6.08
21.1
0.34
1.2
0.34
1.2
0.32
1.1
0.32
1.1
0.22
0.8
0.22
0.8
0.14
0.5
0.74
2.6
0.02
0.1
0.02
0.1
0.00
0.0
28.84
100.0
0.00
0.0
28.10
97.4
0.00
0.0
0.22
0.8

$by.total
"powers4"
"t"
"apply"
"unlist"
"lapply"
"FUN"
"as.vector"
"matrix"
"t.default"
"!"
"is.null"
"aperm"
"array"
"!="

total.time total.pct self.time self.pct
28.84
100.0
0.00
0.0
28.10
97.4
0.00
0.0
27.56
95.6
19.46
67.5
6.08
21.1
0.40
1.4
5.68
19.7
4.02
13.9
2.56
8.9
2.56
8.9
0.82
2.8
0.82
2.8
0.74
2.6
0.14
0.5
0.54
1.9
0.54
1.9
0.34
1.2
0.34
1.2
0.32
1.1
0.32
1.1
0.22
0.8
0.22
0.8
0.22
0.8
0.00
0.0
0.02
0.1
0.02
0.1

$sampling.time
[1] 28.84

Performance Enhancement: Speed and Memory

319

14.5

Byte Code Compilation
Starting with version 2.13, R has included a byte code compiler, which you can
use to try to speed up your code. Consider our example from Section 14.2.1.
As a trivial example, we showed that
z <- x + y

was much faster than
for (i in 1:length(x)) z[i] <- x[i] + y[i]

Again, that was obvious, but just to get an idea of how byte code compilation
works, let’s give it a try:
>
>
>
>

library(compiler)
f <- function() for (i in 1:length(x)) z[i] <<- x[i] + y[i]
cf <- cmpfun(f)
system.time(cf())
user system elapsed
0.845 0.003 0.848

We created a new function, cf(), from the original f(). The new code’s
run time was 0.848 seconds, much faster than the 8.175 seconds the noncompiled version took. Granted, it still wasn’t as fast as the straightforward
vectorized code, but it is clear that byte code compilation has potential. You
should try it whenever you need faster code.

14.6

Oh No, the Data Doesn’t Fit into Memory!
As mentioned earlier, all objects in an R session are stored in memory. R
places a limit of 231 − 1 bytes on the size of any object, regardless of word
size (32-bit versus 64-bit) and the amount of RAM in your machine. However, you really should not consider this an obstacle. With a little extra care,
applications that have large memory requirements can indeed be handled
well in R. Some common approaches are chunking and using R packages for
memory management.

14.6.1

Chunking

One option involving no extra R packages at all is to read in your data from
a disk file one chunk at a time. For example, suppose that our goal is to find
means or proportions of some variables. We can use the skip argument in
read.table().
Say our data set has 1,000,000 records and we divide them into 10
chunks (or more—whatever is needed to cut the data down to a size so it
fits in memory). Then we set skip = 0 on our first read, set skip = 100000
the second time, and so on. Each time we read in a chunk, we calculate
320

Chapter 14

the counts or totals for that chunk and record them. After reading all the
chunks, we add up all the counts or totals in order to calculate our grand
means or proportions.
As another example, suppose we are performing a statistical operation,
say calculating principle components, in which we have a huge number of
rows—that is, a huge number of observations—but the number of variables
is manageable. Again, chunking could be the solution. We apply the statistical operation to each chunk and then average the results over all the
chunks. My mathematical research shows that the resulting estimators are
statistically efficient in a wide class of statistical methods.

14.6.2

Using R Packages for Memory Management

Again looking at a bit more sophistication, there are alternatives for accommodating large memory requirements in the form of some specialized R
packages.
One such package is RMySQL, an R interface to SQL databases. Using it
requires some database expertise, but this package provides a much more
efficient and convenient way to handle large data sets. The idea is to have
SQL do its variable/case selection operations for you back at the database
end and then read the resulting selected data as it is produced by SQL.
Since the latter will typically be much smaller than the overall data set,
you will likely be able to circumvent R’s memory restriction.
Another useful package is biglm, which does regression and generalized
linear-model analysis on very large data sets. It also uses chunking but in a
different manner: Each chunk is used to update the running totals of sums
needed for the regression analysis and then discarded.
Finally, some packages do their own storage management independently of R and thus can deal with very large data sets. The two most commonly used today are ff and bigmemory. The former sidesteps memory constraints by storing data on disk instead of memory, essentially transparently
to the programmer. The highly versatile bigmemory package does the same,
but it can store data not only on disk but also in the machine’s main memory, which is ideal for multicore machines.

Performance Enhancement: Speed and Memory

321