The world’s leading publication for data science, AI, and ML professionals.

Why You Should Vectorize Your Code in R

Using the microbenchmark package in R to compare the efficiency between vectorized operations and for-loops

Photo by Vipul Jha on Unsplash
Photo by Vipul Jha on Unsplash

In this article, I will illustrate the benefits of using vectorized code by comparing how long it takes to perform three different tasks using vectorized operations versus using a for-loop for the same task. The microbenchmark package in R provides a handy tool to compare how long different R expressions take to execute.

The code used in this article can be found in this GitHub repo.

Multiplying vectors

We start by looking at an example of simply multiplying two vectors. First, we write two versions of multiplying two vectors and wrap them in functions so that we can pass them to the microbenchmark function. We load the microbenchmark package and pass our two versions to the microbenchmark function. By setting times = 100, we will run each function 100 times and the result will show us the summary statistics.

Microbenchmark returns summaries of the test, and if we look at the mean, we can see that multiplying using a For Loop takes seven times longer than vectorized multiplication.

Image by Author
Image by Author

So why are vectorized operations so much faster? In this example, we multiplied a vector consisting of integers 1 to 100 with itself. When using a for loop we multiply the integers one pair at a time, as in one multiplication in each iteration. In this case, the for loop has 100 iterations, so R does the multiplication 100 times. When doing the vectorized multiplication, the entire vector is passed into the operation, meaning that the vector is multiplied once. In the for loop we also assign the product to the new vector 100 times, while in the vectorized version, we assign the result only once. Hence the shorter running time.

Generating random numbers

Another task you might find yourself doing often is generating random numbers. The code below will generate 1,000,000 random numbers from the uniform distribution between 0 and 1.

In this example, the first function calls on the random number generator function runif() once, while the second function calls the runif() one million times. Not surprisingly, the for loop takes much longer, in this test almost 221 times longer.

Image by Author
Image by Author

Estimating pi with Monte Carlo simulation

Finally, we will look at an example where we both generate random numbers and do some calculations on those numbers. We will compare two functions that estimate pi by using random sampling, one uses vectorized operations and one uses a for loop.

More details and a guide on how to do this simulation can be found in this article on Monte Carlo simulation.

Estimating Pi Using Monte Carlo Simulation in R

Basically, what the functions does, is generate random (x, y) coordinates for points in the square. It calculates the distance to the circle origin to determine if the point falls within the circle, and finally estimates pi.

In the vectorized version, we call the runif() function twice, once for each x and y vector, call on which() and length() once respectively, and do two calculations. In the for loop, we call the runif() function 2 million times and calculate the distance 1,000,000 times and finally one calculation of pi.

The Microbenchmark shows us that the for loop version takes 117 times longer to run than the vectorized version!

Image by Author
Image by Author

Hopefully, these examples helped you see the benefits of using vectorized operations in R. There is no need for you to go back to your old scripts and re-write your code. In many cases, the amount of data we are working with is relatively small and the operations may only take a few seconds and the difference between vectorized operations and for-loops may not be noticeable. But you may find yourself working with huge datasets and writing complex operations that take hours, and in those situations, you certainly want to have code that R processes as efficiently as possible. Understanding what is happening in vectorized operations can help you write shorter, simpler, and faster code going forward.


If you are new to R and want to learn more, you can find out how you can practice writing R code in free interactive courses in this article.

The Free Interactive R Courses Most People Don’t Know About


If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.

Join Medium with my referral link – Andrea Gustafsen


Related Articles