Tutorial | R | Data Structures
Collections of Data Types

Now that you know the different types of variables, let’s look at some options R has for collecting these variables together. We’ll look at the following types:
- Vectors
- Matrices
- Data Frames
- Lists
Vectors
One of the most essential types in R, vectors are the backbone of a lot of operations. While they are limited to only containing 1 type of variable, this allows them to perform actions faster than using for loops. They are also the building block for data frames which we will cover later in this post. Let’s make some vectors!
numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
strings <- c("this", "is", "some", "text")
bad_example <- c(1, "text", ymd_hms("2020-3-1 3:12:16"))
Now I know what you’re thinking. I just said you can only put 1 type of variable inside a vector, but bad_example
has 3! Well, not quite. If you look at the environment pane, you’ll see that it is a character vector with 3 elements. R coerced, or converted, all of the elements of the vector to the same type.

Some functions return a single output when used on a vector, like mean()
. Others will perform an action on each element of the vector and return a vector of equal length, like paste()
.
mean(numbers)
# 5.5
paste("test", strings)
# "test this" "test is" "test some" "test text"
The way these functions work on vectors is what gives R a lot of its speed. They operate much faster than writing a for loop. By knowing that all elements are the same type, R can eliminate a lot of the internal overhead it would have in the for loop.
Matrices
Matrices are used to store a grid of numbers. Certain statistical functions take these as inputs or use them internally to speed up calculations. K-Means clustering in R takes a matrix as its input, and the nnet
package for neural networks uses matrices internally to do computations. Let’s look at our options for creating some matrices below.
When creating matrices, we can pass the matrix
function a vector containing the numbers to put in the matrix.
long_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9))
print(long_matrix)
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
# [4,] 4
# [5,] 5
# [6,] 6
# [7,] 7
# [8,] 8
# [9,] 9
If we want to specify the number of columns or rows, we can do that using the ncol
and nrow
arguments.
square_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)
print(square_matrix)
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
By default, R takes the input vector and fills each column. To fill each row, just add the byrow
argument and set it to TRUE
. Compare it to the previous output for square_matrix
.
square_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9

If you’ve ever taken a course in linear algebra, you’ll know that matrices can be added, multiplied, and have some special properties such as crossproducts and eigenvalues. R supports a range of these with these special operators and functions
# Add matrices
square_matrix + square_matrix
# [,1] [,2] [,3]
# [1,] 2 4 6
# [2,] 8 10 12
# [3,] 14 16 18
# Multiply matrices
square_matrix * square_matrix
# [,1] [,2] [,3]
# [1,] 1 4 9
# [2,] 16 25 36
# [3,] 49 64 81
# Find the transpose of a matrix
t(square_matrix)
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
# Find crossproduct of 2 matrices
crossprod(square_matrix, square_matrix)
# [,1] [,2] [,3]
# [1,] 66 78 90
# [2,] 78 93 108
# [3,] 90 108 126
# Find Eigenvalues of a matrix
eigen(square_matrix)
# eigen() decomposition
# $values
# [1] 1.611684e+01 -1.116844e+00 -1.303678e-15
#
# $vectors
# [,1] [,2] [,3]
# [1,] -0.2319707 -0.78583024 0.4082483
# [2,] -0.5253221 -0.08675134 -0.8164966
# [3,] -0.8186735 0.61232756 0.4082483
Data Frames
Data frames are very similar to how most people are used to seeing data represented, in a table of rows and columns. The data frame in R is composed of columns of vectors. Data frames are ideal for plotting data and are used as inputs for linear and logistic regression, the K-nearest neighbors algorithm.
They are super easy to make using what we have already learned about vectors. If we create a few vectors of equal length, we can combine them together with the data.frame
function.
As an example, let’s create a data frame to hold the names and scores of students in a class. We’ll create 3 vectors: first name, last name, and score. Then we’ll pass the vectors to the data frame function, saving the data frame as student_scores
.
first_name <- c("John", "Catherine", "Christopher")
last_name <- c("Cena", "Zeta-Jones", "Walken")
test_score <- c(84, 92, 100)
student_scores <- data.frame(first_name = first_name, last_name = last_name, test_score = test_score)
print(student_scores)
# first_name last_name test_score
# <fctr> <fctr> <dbl>
# John Cena 84
# Catherine Zeta-Jones 92
# Christopher Walken 100
As you can see, each argument is a column name, and the data in that column is the vector that we set the column name equal to in the data.frame
function. When we print student_scores
, we get the column names, data types and data.
You may notice the type of first name and last name are factor. By default R takes strings as factors, but this behavior doesn’t make a lot of sense for our names. Adding the stringsAsFactors = FALSE
argument when creating the data frame will set them to characters.

Data frames have some functions that help you gain more information about the data they contain:
nrow()
/ncol()
– returns the number of rows or columns in the data frame
ncol(student_scores)
# 3
nrow(student_scores)
# 3
names()
– returns the names column names in the data frame
names(student_scores)
# "first_name" "last_name" "test_score"
head()
/tail()
– returns the first few or last few rows of the data frame. The default number of rows to display is 6. Use then
parameter to change how many rows to display
# Display up to the first 6 rows of the data frame
head(student_scores)
# Display up to the last 2 rows of the data frame
tail(student_scores, n = 2)
summary()
– return summary information about each column of the data frame. Numeric types will display minimum, maximum, mean, and quantile metrics for that column
# Display summary information about the columns of the data frame
summary(student_scores)
# first_name last_name test_score
# Length:3 Length:3 Min. : 84
# Class :character Class :character 1st Qu.: 88
# Mode :character Mode :character Median : 92
# Mean : 92
# 3rd Qu.: 96
# Max. :100
Also of note is the ability to pull the individual column vectors and perform all the same operations we saw earlier on just a vector. You can use the dollar sign notation (data.frame$column_name
) or bracket notation (data.frame[2]
or data.frame["column name"]
)
# Select the last_name column
# with dollar sign ($) notation
student_scores$last_name
# Select the 2nd column of the data frame
# with bracket notation
student_scores[2]
# Select the first_name column of the data frame
# with bracket notation
student_scores["first_name"]
Lists
Lists round out our basic building blocks of R. They are incredibly versatile because they can hold nearly anything inside, even variables of different types, and can even be nested inside of each other to organize more information. They can be somewhat slow to work with, but are commonly used to store the output from models like linear regression.
Let’s create a list containing different data types. We’ll add a vector, data frame, and another list to our example list.
example_list <- list(c(1, 2, 3, 4, 5), student_scores, list("first", "second", "third"))
print(example_list)
#[[1]]
#[1] 1 2 3 4 5
#[[2]]
# first_name last_name test_score
#1 John Cena 84
#2 Catherine Zeta-Jones 92
#3 Christopher Walken 100
#[[3]]
#[[3]][[1]]
#[1] "first"
#[[3]][[2]]
#[1] "second"
#[[3]][[3]]
#[1] "third"
When the list is printed, you can see that each element of the list is represented by a number in double brackets. Since the vector was the first element of the list, it is [[1]]
. The elements of the list within our list, or the nested list, are represented with 2 numbers in double brackets. They can be referenced by first specifying the element that corresponds with the nested list [[3]]
and then by the element of the nested list, such as [[2]]
for the 2nd element.
# 2nd element of the example list
example_list[[2]]
# 2nd element of the nested list (the 3rd element of the example list)
example_list[[3]][[2]]

List elements can also be named. This can be done by assigning them names when creating the list, very similar to how names are assigned when creating a data frame.
named_list <- list(vector = c(1, 2, 3, 4, 5), data_frame = student_scores, list = list("first", "second", "third"))
Now, we can get different elements of the list by using a combination of dollar sign and bracket notation.
# Access the named element of named_list
named_list$list
# Access the 2nd element of the list element of named_list
named_list$list[[2]]

Conclusion
You are now well on your way to starting your Data Science or analytics or data visualization with R! Knowing the building blocks will help you organize your variables. Knowing the advantages of storing in a list versus a data frame will help keep your code optimized, even when your data starts to grow. Knowing the functions to explore and work with your data will allow you to understand what is stored within them.