While not as popular as Python, R has a strong and growing user base as a programming language, and as an applied statistician it will always be my language of choice. There are many types of R users in my experience. There are those that just scrape by on enough knowledge to finish their stats homework assignments, there are those that use it on a regular basis but mostly work around convenient data wrangling packages like dplyr
, and there are those with a deep knowledge of the language and its underlying structures.
Where do you sit? Here are twenty questions to test your R knowledge. Try to answer them without actually running code, and then check the answers below to see how you did. Then maybe get a friend to test themselves and compare notes. I wrote this quiz for fun, so please don’t use it for serious stuff like Data Science interviews – it’s not intended for that!
The Quiz
Question 1: x <- vector()
. What is the data type of x
?
Question 2: y <- 2147483640L:2147483648L
. What is the data type of y
?
Question 3: z <- 0/0
. What is the class of z
?
Question 4: If v <- complex(1,1)
, what is the output of c(v, TRUE)
?
Question 5: A homogeneous 1-D and 2-D data structure in R is called an atomic vector and matrix respectively. What is the name for a) a 1-D heterogeneous data structure, b) a 2-D heterogeneous data structure and c) an _n-_dimensional data structure where n > 2?
Question 6: What is the significance of the terms Camp Pontanezen and Kite-eating Tree to R? What is the origin of these terms?
Question 7: What will happen in each of the following cases if the package dplyr
is not installed?
Case 1:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarize(mean_mpg = mean(mpg))
Case 2:
require(dplyr)
mtcars %>%
group_by(cyl) %>%
summarize(mean_mpg = mean(mpg))
Question 8: a <- c(2, "NA", 3)
. What is the output of sum(is.na(a))
?
Question 9: What is the output ofdata()
?
Question 10: What is the output of round(0.5)
?
Question 11: Which of these packages is not loaded when you run the command library(tidyverse)
? a) dplyr
b) tidyr
c) broom
d) ggplot2
Question 12: In the latest R version, which of the following three code snippets does not correctly apply a function across the elements of a list l
?
## A
lapply(l, function(x) x + 10)
## B
lapply(l, x -> x + 10)
## C
lapply(l, (x) x + 10)
Question 13: Take a look at the output of this code and note that it is not producing the sum of each row as we might expect? Without editing the existing code, what needs to be added to the code to correct this?
library(dplyr)
df <- data.frame(x = c(1, 2), y = c(1, 2), z = c(1,2))
df %>%
mutate(sum = sum(x, y, z))
## x y z sum
## 1 1 1 1 9
## 2 2 2 2 9
Question 14: Which of these packages allows users to run code from the Julia language in R? a) JuliaCall
b) RJulia
c) JuliaR
Question 15: Which of the following is a function in the latest version of dplyr
? a) c_across
b) r_across
c) l_across
d) s_across
Question 16: Why will the following code not work and what would need to be added to make it work?
library(tidyverse)
mtcars %>%
nest_by(cyl) %>%
dplyr::mutate(
ggplot(data = data,
aes(x = hp, y = mpg)) +
geom_point()
)
Question 17: What function would be used to generate random numbers from a uniform distribution?
Question 18: If x <- factor(c(4, 5, 6))
what is the output of as.numeric(x)
?
Question 19: Again with x <- factor(c(4, 5, 6))
what is the difference between the outputs of str(x)
and typeof(x)
?
Question 20: Look at the two code snippets below. If they are both run in the latest version of R, why will Snippet A succeed but Snippet B fail?
library(dplyr)
## Snippet A
mtcars %>%
filter(grepl("Mazda", rownames(.)))
## Snippet B
mtcars |>
filter(grepl("Mazda", rownames(.)))
Answers
x
is logical. This is the default type for an atomic vector.y
is a double, despite the use of the integer notationL
iny
. This is because the maximum value for an integer in R is 2147483647. So the last value ofy
is coerced to a double, and consequently since atomic vectors are homogeneous, the entire vector is coerced to a double.z
is of the class numeric.- The output is a vector with two elements, both
1 + 0i
. Note that the first argument ofcomplex()
islength.out
indicating the length of the complex vector. Socomplex(1,1)
evaluates to1 + 0i
butcomplex(1,1,1)
evalates to1 + 1i
. Note thatTRUE
will be coerced to a complex type equivalent of1 + 0i
. - a) List; b) Data frame; c) Array
- They are the nicknames of R version releases. All version nicknames are taken from old Peanuts comic strips.
- In Case 1, the first line will generate an error indicating that there is no such package installed, and execution will stop. In Case 2, the first line will generate a warning, but the second line will still be executed, and will generate an error because it cannot find
%>%
(assumingmagrittr
is not attached). This is a good illustration of the difference betweenlibrary()
andrequire()
.library()
attaches a package, butrequire()
evaluates whether a package has been attached, evaluating toTRUE
if it has been attached and toFALSE
otherwise. Usingrequire()
can make it more difficult to debug your code. - This evaluates to zero. Note that
"NA"
is a character string and not a missing value. - The output is a list of all inbuilt data sets in R.
- The output is zero. R follows the IEC 60559 standard, where .5’s round to the nearest even number.
broom
is not loaded, as it is not a package in the "core tidyverse". Note thatinstall.packages("tidyverse")
will installbroom
together with all the packages in the core and extended tidyverse.- Option B will not work. Note that option C is the new anonymous function syntax released in R 4.1.0.
- The code needs to contain the line
rowwise() %>%
before themutate
statement to declare that the function is to be applied row-by-row. JuliaCall
c_across
. It is the equivalent of theacross()
function but for row-wise operations.- This code is attempting to generate a column of plots. This would need to be declared as a list column as follows:
library(tidyverse)
mtcars %>%
nest_by(cyl) %>%
dplyr::mutate(
list(ggplot(data = data,
aes(x = hp, y = mpg)) +
geom_point())
)
runif()
- The output is the vector
c(1, 2, 3)
. Factors are converted to their integer representations. str(x)
gives the structure ofx
which is Factor.typeof(x)
gives the storage mode of the data inx
which is Integer.- Snippet B uses the new native pipe operator
|>
. Unlike the pipe operator%>%
in Snippet A, the native pipe only pipes into the first unnamed argument of a function, and will not accept.
to pipe into other arguments. To obtain the same output as Snippet A, the following use of an anonymous function will be needed:
mtcars |>
{(df) filter(df, grepl("Mazda", rownames(df)))}()
How did you do?
If you scored 5 or less, you urgently need a tutorial in base R to avoid spending too much time resolving unnecessary errors in your code.
If you scored 6–10, you likely have a similar level of knowledge to most R users.
11–15 is a very good score, you clearly know a lot of the underlying principles and structures of the R Programming language.
If you scored 16–20, you are a supeRstar. You probably know a lot of needless R trivia, and you might well be an R pedant. I hope you are helping others on StackOverflow.
_Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist. I am passionate about applying the rigor of all those disciplines to complex people questions. I’m also a coding geek and a massive fan of Japanese RPGs. Find me on LinkedIn or on Twitter. Also check out my blog on drkeithmcnulty.com or my soon to be released textbook on People Analytics._