Growing Objects and Loop Memory Pre-Allocation

By Thiago Milagres

Preallocating Memory

This will be a short post about a simple, but very important concept that can drastically increase the speed of poorly written codes. It is very common to see R loops written as follows:

v = NULL
n = 1e5
for(i in 1:n) v = c(v, i)

This seems like a natural way to write such a task: at each iteration, we increase our vector v to add one more element to it.

However, we will see that if we simply initialize a vector (with NAs, zeros or any other value) with the total length and then run the loop, we can drastically increase the speed of our algorithm. To illustrate this point, we will use the microbenchmark package. First, let’s put both approaches in functions:

f_slow = function(n) {
v = NULL
for(i in 1:n) v = c(v, i)
return(v)
}

f_fast = function(n) {
v = rep(NA, n)
for(i in 1:n) v[i] = i
return(v)
}

Now to compare the speeds we can start with a small value of n, say 100.

library(microbenchmark)
library(ggplot2)
n = 100
mbm = microbenchmark("f_slow" = f_slow(n),
"f_fast" = f_fast(n), times = 50
)
print(mbm)
## Unit: microseconds
##    expr    min     lq   mean median     uq      max neval
##  f_slow 24.060 24.576 64.690 25.156 26.892 1958.947    50
##  f_fast  5.098  5.419 43.235  5.562  5.790 1882.557    50

We see that even for such a simple loop, preallocating the vector already makes a big difference in the runtime. For bigger values of n, such difference becomes much more relevant:

n = 1e5
mbm = microbenchmark("f_slow" = f_slow(n),
"f_fast" = f_fast(n), times = 50
)
print(mbm)
## Unit: milliseconds
##    expr      min       lq     mean   median       uq      max neval
##  f_slow 8615.797 8697.359 8736.413 8728.380 8776.894 8879.216    50
##  f_fast    4.683    4.968    5.257    5.100    5.196    7.098    50

Fully understanding why this happens would require understanding how dynamic allocation of memory works, which is not our focus here, but the basic idea is that when the object grows inside a loop, the computer must repeatedly find more space available for it in the memory, which is a costly process that can lead to memory fragmentation. For illustration purposes, let’s plot the object size (in bytes) as a function of i for our two loops. Note how, in the fast case, the object doesn’t increase in size over time.

library(pryr)
n = 1e5
object_size_slow_loop = rep(NA, n)
object_size_fast_loop = rep(NA, n)

v1 = NULL
for(i in 1:n) {
v1 = c(v1, i)
object_size_slow_loop[i] = object_size(v1)
}

v2 = rep(NA, n)
for(i in 1:n){
v2[i] = i
object_size_fast_loop[i] = object_size(v2)
}

sizes = cbind.data.frame(object_size_fast_loop, object_size_slow_loop)
print(ggplot(sizes, aes(1:n)) +
geom_line(aes(y = object_size_fast_loop, colour = "Fast loop")) +
geom_line(aes(y = object_size_slow_loop, colour = "Slow loop")) +
ggtitle("Size of the object at each iteration") + xlab("Iteration") + ylab("Size in bytes")
) Beyond Vectors

In the above examples, we used vectors just because they are simpler to visualize, but note that this principle holds for any other data structure, such as matrices and lists. Let’s see one quick example of each, starting with the matrix case:

n = 1e3

f_slow_matrix = function(n) {
M = NULL
for(i in 1:n){
M = rbind(M, rep(1, n))
}
return(M)
}

f_fast_matrix = function(n) {
M = matrix(NA, nrow = n, ncol = n)
for(i in 1:n){
M[i,] = rep(1, n)
}
return(M)
}

mbmM = microbenchmark("f_slow_matrix" = f_slow_matrix(n),
"f_fast_matrix" = f_fast_matrix(n)
,times = 50
)
print(mbmM)
## Unit: milliseconds
##           expr      min       lq     mean   median       uq       max neval
##  f_slow_matrix 5823.364 5858.490 5911.030 5914.014 5945.253  6040.606    50
##  f_fast_matrix    7.744    9.607   10.788   10.349   11.270    14.078    50

and, finally, an example with lists:

n = 1e4

f_slow_list = function(n) {
L = list()
for(i in 1:n){
L[[paste0('index_',i)]] = i
}
return(L)
}

f_fast_list = function(n) {
L = vector(mode = "list", length = n)
for(i in 1:n){
L[[i]] = i
}
names(L) = paste0("index_", 1:n)
return(L)
}

mbmL = microbenchmark("f_slow_list" = f_slow_list(n),
"f_fast_list" = f_fast_list(n)
,times = 50
)
print(mbmL)
## Unit: milliseconds
##         expr     min      lq    mean  median      uq      max neval
##  f_slow_list 607.299 616.947 682.738 639.575 656.597 1298.282    50
##  f_fast_list   3.273   3.364   3.967   3.559   3.775    7.704    50

Final Note: When maximum length is not known from the start

It could be the case that the maximum length is not known to the user. However, an upper bound (even a very bad one) could be used, and then after running the loop we can simply delete the part that wasn’t useful. For example, suppose that the true length is 100,000, but all we known is the upper bound 10,000,000. We can preallocate memory using our upper bound, as follows:

n = 1e5
UB = 1e7
f_slow = function(n) {
v = NULL
for(i in 1:n) v = c(v, i)
return(v)
}

f_fast_upper_bound = function(n, UB) {
v = rep(NA, UB)
for(i in 1:n) {
v[i] = i
}
v = v[1:n]
return(v)
}

mbmUB = microbenchmark("f_slow" = f_slow(n),
"f_fast_upper_bound" = f_fast_upper_bound(n, 1e7), times = 50
)
print(mbmUB)
## Unit: milliseconds
##                expr      min       lq     mean   median       uq      max neval
##              f_slow 7529.350 7903.024 8394.622 8598.460 8828.130 9318.457    50
##  f_fast_upper_bound   38.629   54.643  112.712   128.72  137.910  224.047    50
1. HenrikB says: