How to Write Conditional Statements in R: Four Methods

Learn powerful ways to go beyond if-else statements and level up your R code

Rory Spanton
Towards Data Science
8 min readJun 13, 2023

Photo by Caleb Jones on Unsplash

You won’t get far in programming without conditional statements.

Conditional statements execute code based on the result of a true-or-false condition. They’re an essential part of coding, and this is especially true in R. Whether you’re using R for data analysis, machine learning, software development, or something else, conditional statements have infinite uses.

But, most beginners in R don’t realize that there are many ways to write them. Many people learn basic if-else statements and stop there. But, there are often neater, more efficient ways to write conditional statements. Advanced R programmers know each of these techniques and when to use them. So, how can you learn to do the same?

In this article, we’ll take a look at four different ways to write conditional statements in R. We’ll also cover the strengths and limitations of each technique, and when to use each one.

How to write an if-else statement in R

The most straightforward way of writing conditional statements in R is by using the if and else keywords. This will be most familiar if you already know another programming language, and it’s often the technique that new R users learn first.

A standard if statement in R looks like this:

if (condition) {
# Code to execute
}

Here, condition is a logical expression that returns either TRUE or FALSE. If the condition returns TRUE, any code inside the curly braces is executed. If it returns FALSE, the code inside the brackets is not executed, and R moves on to the next line of code in the script.

To see how this works in practice, we can take the following example.

age <- 25

if (age >= 18) {
age_group <- "adult"
}

Here, we have a variable that contains an age. The if statement then evaluates whether the value of age is greater than or equal to 18. This is true in this case, so the variable age_group takes a value of “adult”.

This is an easy way of checking a simple condition and doing something if it’s true. But what if we want our statement to run some code if the condition is false?

If else statements are an extension of the basic if statement. To understand them, we can add to our previous example.

if (age >= 18) {
age_group <- "adult"
} else {
age_group <- "child"
}

This code works just like the last example, with one exception. Instead of moving on when the condition is FALSE, the code inside the curly brackets after else gets executed. This means that if age is greater than or equal to 18, age_groupis assigned a value of “adult”. If not, age_group is set to “child”.

If-else statements are a straightforward way of controlling the code in an R script. They’re easily understood, can be extended to take many conditions, and can execute complex code that’s many lines long.

But, if-else statements can take up a lot of space. For simple expressions like the one above, there are other ways of doing exactly the same operation without using five lines of code.

In fact, it’s possible to write if-else statements using one line of code.

Inline conditional statements in R

Inline conditional statements are a neat way of expressing “if-else” logic in a single line of code. There are a couple of ways to write them.

Inline if else statement

First, it’s possible to write a simple inline statement using the if and else keywords. This takes the form below:

age_group <- if (age >= 18) "adult" else "child"

This statement works the same way as the previous example. The only difference is that now, we’ve condensed the phrasing to fit on one line. If the condition is TRUE, the value of age_group gets updated to whatever is before the else keyword — in this case, “adult”. If it were FALSE, age_group would be assigned whatever comes after else.

The big difference here is that we now assign the result of the whole conditional statement to the variable age_group. This improves on the repetitive phrasing in the standard if-else example, where we had to write this assignment twice.

Base-R ifelse function

If you prefer, you can use the ifelse function instead. The code below uses this function to execute the same logic as the previous examples.

age_group <- ifelse(age >= 18, "adult", "child")

The ifelse function takes three arguments. First, comes the condition, then a value to return if the condition is true, and a value to return if the condition is false.

This is a clean, straightforward way of writing a short conditional statement. It also has another advantage; it’s vectorized.

Vectorization is an important concept in R. If a function is vectorized, it automatically applies to multiple values instead of just one. To see an example with the ifelse function, let’s assign more values to our variable age, and run the code again.

age <- c(16, 45, 23, 82)

age_group <- ifelse(age >= 18, "adult", "child")
# Returns "child" "adult" "adult" "adult"

The ifelse function automatically evaluates all the values in age, returning a sequence of corresponding outputs. This makes ifelse a clean way of evaluating lots of simple conditions without needing slow, messy loops.

Conditional indexing in R

Although ifelse can evaluate many inputs easily, there are other ways to do this.

Indexing allows R programmers to access specific parts of a data structure that contains many values. For example, if we wanted to get the third element in the vector age from the last example, we could index age with 3 inside square brackets:

age <- c(16, 45, 23, 82)

age
# Returns 16, 45, 23, 82

age[3]
# Returns 23

It’s most common to use numbers to index values with certain positions, like in the code above. But, many beginner R programmers don’t know that you can also use logical conditions when indexing. This opens up all sorts of possibilities.

Let’s create some example data to illustrate some of these options. This includes some information about users, such as age, as in the previous examples. But, rather than being stored in a vector, each user’s information is stored row-wise in a tibble. This is the kind of data structure you’d be likely to see if dealing with user data in a professional setting, so it’s useful to know how to apply conditional logic to it.

set.seed(123)

user_data <- tibble(
user_id = 1:10,
age = floor(runif(10, min = 13, max = 35)),
region = sample(c("UK", "USA", "EU"), 10, replace = TRUE)
)
The data created by the code above.

Tibbles and data frames are made up of vectors, which means we can index them in the same way. This allows us to do all sorts of things.

Extracting values in a column based on a condition

Here’s some code that extracts any values of the user_id column where the user’s age is under 18.

user_data$user_id[user_data$age < 18]
# Returns 6

Replacing values in a column based on a condition

Here’s how to recode all the “UK” rows in the region column as “EU”.

user_data$region[user_data$region == "UK"] <- "EU"
The result of the code above, where the “UK” region values have been replaced with “EU”.

Filtering a dataset with conditional indexing

We can even use conditional subsetting to filter the whole dataset. Here’s a method of filtering all the rows where the region is “USA”. Note that after the logical condition here, we include a comma to tell R that we’re indexing by row. If we wanted to filter by column, we could add a condition after the comma.

user_data[user_data$region == "USA",]
The result of the code above.

These are just a few applications of conditional indexing in R. If you need to do a quick conditional operation on data, chances are there’s a one-line solution for it using this method.

Tidyverse case_when function

The case_when function is from the tidyverse family of packages. It’s another way of applying conditional statements across a set of many values and is once again useful when working with datasets.

We can use the data from the last example to show how to create a new column based on conditional statements with case_when.

user_data %>%
mutate(drinking_age = case_when(region == "USA" & age >= 21 ~ TRUE,
region == "EU" & age >= 18 ~ TRUE,
.default = FALSE))
The result of the code above, with the new column “drinking_age”.

This code determines whether our site users are legally allowed to drink alcohol based on their age and location. Here, each condition is on a different line in the case_when statement. If the condition is true, we can return a value after the tilde (~) symbol — in this case, TRUE. If neither of the conditions are satisfied, the value returned is specified by the .default setting. All the values are stored in a new column, drinking_age.

If this seems a bit unfamiliar, here’s the equivalent if else statement:

if (region == "USA" & age >= 21) {
drinking_age <- TRUE
} else if (region == "EU" & age >= 18) {
drinking_age <- TRUE
} else {
drinking_age <- FALSE
}

Compared with the code above, it’s easy to see that case_when provides yet another way of implementing conditional statements that is more concise than the if-else statement, while being just as powerful. It’s now my go-to for creating new columns based on complex logic or multiple conditions. For tidyverse users, it’s a must-adopt feature.

When to use different kinds of conditional statements

The methods I’ve covered only truly start to shine when you put them to work in your own code. Only by playing around with new approaches and getting comfortable with them will you reap their full benefits and achieve R programming fluency.

So, when should you use each type of conditional statement?

As with any choice between approaches in programming, there isn’t a straight answer. That said, here are some rough guidelines I use to help me choose between different ways of writing conditional statements.

  • If I’m working on a problem that requires lots of complex, multi-line code to get executed upon certain conditions, I often prefer if-else statements. Using other methods often gets messy and hard to maintain in these situations.
  • For problems with simpler conditions and shorter code chunks to return, I like inline if-else statements. There’s no point in making a straightforward solution longer than it needs to be!
  • If working with datasets or creating columns based on conditions, I use case_when. It works well with the other tidyverse functions I use and is easy to debug and maintain.
  • If I’m working with datasets and don’t want to load extra packages, I’ll use conditional indexing. It doesn’t need any extra dependencies and often runs fast too.

My advice? Have a play around with each technique and see what sticks for you. At the very least, you might pick up one new way of making your code better.

So, if you liked this article, why not share your favourite conditional statement? Else… thanks for staying until the end anyway!

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Rory Spanton

Behavioural Data Scientist @ Good With. Writing about data science, psychology, programming, and more. www.roryspanton.com

No responses yet

What are your thoughts?