
Anyone who has worked with categorical Data eventually came across a need to calculate the absolute number and proportion of a certain class. This article introduces the tabyl
function for creating frequency tables through a series of hands-on examples.
What does tabyl bring to the table (no pun intended :D)?
The tabyl
function is a feature of the janitor
package in R. It’s a very convenient tool for creating contingency tables, otherwise known as frequency tables or cross-tabulations. Here are some of the benefits of using tabyl
:
- Easy syntax:
tabyl
has an easy-to-use syntax. It can take one, two, or three variables, and it automatically returns a data frame that includes counts and proportions. - Flexibility:
tabyl
can generate one-way (single variable), two-way (two variables), and three-way (three variables) contingency tables. This flexibility makes it suitable for a wide range of applications. - Automatic calculation of proportions:
tabyl
automatically calculates the proportions (percentages) for one-way contingency tables. For two and three-way tables, the same result can be accomplished in combination with theadorn_percentages
function from the same package. - Compatibility with
Dplyr
: The output oftabyl
is a data frame (or tibble), which makes it fully compatible withdply
functions and the tidyverse ecosystem. This means you can easily pipe%>%
the output into further data wrangling or visualization functions. - Neat and informative output:
tabyl
provides neat and informative output, which includes the variable names as row names and column names, making it easier to interpret the results.
For all these reasons, tabyl
is a great choice when you want to create frequency tables in R. It simplifies many steps and integrates well with the tidyverse approach to data analysis.
The dataset

This post will demonstrate the benefits of the tabyl
function from the janitor
package using the data on the edibility of different types of mushrooms depending on their odor. Here, I will be using a tidied dataset under the name mushrooms, but you can access the original data on Kaggle. Below is the code used for cleaning the data.
library(tidyverse)
library(janitor)
mushrooms <- read_csv("mushrooms.csv") %>%
select(class, odor) %>%
mutate(
class = case_when(
class == "p" ~ "poisonous",
class == "e" ~ "edible"
),
odor = case_when(
odor == "a" ~ "almond",
odor == "l" ~ "anise",
odor == "c" ~ "creosote",
odor == "y" ~ "fishy",
odor == "f" ~ "foul",
odor == "m" ~ "musty",
odor == "n" ~ "none",
odor == "p" ~ "pungent",
odor == "s" ~ "spicy"
)
)
If you are unfamiliar with the above syntax, please check out a hands-on guide to using the tidyverse in one of my earlier articles.
The old
In order to better understand which advantages tabyl
offers, let’s first make a frequency table using the base R table
function.
table(mushrooms$class)
edible poisonous
4208 3916
table(mushrooms$odor, mushrooms$class)
edible poisonous
almond 400 0
anise 400 0
creosote 0 192
fishy 0 576
foul 0 2160
musty 0 36
none 3408 120
pungent 0 256
spicy 0 576
Unsurprisingly, it turns out that odor is a great predictor of mushroom edibility, with anything "funny-smelling" probably being poisonous. Thank you evolution! Also, there seem to be many more poisonous mushrooms, so it’s always important to be cautious when picking mushrooms on your own.
If we want to be able to use the variable names directly without specifying the $
operator, we would need to use the with
command to make the dataset available to the table
function.
mush_table <- with(mushrooms, table(odor, class))
Unfortunately, if we want to upgrade to proportions instead of absolute numbers, we can not use the same function but another one instead – prop.table
.
prop.table(mush_table)
class
odor edible poisonous
almond 0.049236829 0.000000000
anise 0.049236829 0.000000000
creosote 0.000000000 0.023633678
fishy 0.000000000 0.070901034
foul 0.000000000 0.265878877
musty 0.000000000 0.004431315
none 0.419497784 0.014771049
pungent 0.000000000 0.031511571
spicy 0.000000000 0.070901034
By default, this gives us a column-wise proportion table. If we want row-wise proportions, we can specify the margin
argument (1 for row-wise and 2 for column-wise).
prop.table(mush_table, margin = 1)
class
odor edible poisonous
almond 1.00000000 0.00000000
anise 1.00000000 0.00000000
creosote 0.00000000 1.00000000
fishy 0.00000000 1.00000000
foul 0.00000000 1.00000000
musty 0.00000000 1.00000000
none 0.96598639 0.03401361
pungent 0.00000000 1.00000000
spicy 0.00000000 1.00000000
All these special functions can feel cumbersome and hard to remember, so a single function which contains all the above funcionality would be nice to have.
Additionally, if we check the type of the created object using the class(mush_table)
command, we see that it is of a class table
.
This creates a compatibility problem, since nowadays R users are mostly using the tidyverse ecosystem which is centered around applying functions to data.frame
type objects and stringing the results together using the pipe (%>%
) operator.
The new
Let’s do the same things with the tabyl
function.
tabyl(mushrooms, class)
class n percent
edible 4208 0.5179714
poisonous 3916 0.4820286
mush_tabyl <- tabyl(mushrooms, odor, class)
mush_tabyl
odor edible poisonous
almond 400 0
anise 400 0
creosote 0 192
fishy 0 576
foul 0 2160
musty 0 36
none 3408 120
pungent 0 256
spicy 0 576
Compared to the corresponding table
output, the resulting tables aretidier using the tabyl
function, with variable names (class) being explicitly stated. Moreover, for the one-way table, aside from numbers, the percentages are automatically generated as well.
We can also notice that we didn’t have to use the which functio to be able to specify the variable names directly. Additionally, running class(mush_tabyl)
tells us that the resulting object is of a data.frame
class which ensures tidyverse compatibility!
The adorned janitor

For additional tabyl
functionalities, the janitor
package also contains a series of adorn
functions. To get the percentages, we simply pipe the resulting frequency table to the adorn_percentages
function.
mush_tabyl %>% adorn_percentages()
odor edible poisonous
almond 1.0000000 0.00000000
anise 1.0000000 0.00000000
creosote 0.0000000 1.00000000
fishy 0.0000000 1.00000000
foul 0.0000000 1.00000000
musty 0.0000000 1.00000000
none 0.9659864 0.03401361
pungent 0.0000000 1.00000000
spicy 0.0000000 1.00000000
If we want the column-wise percentages, we can specify the denominator
argument as "col".
mush_tabyl %>% adorn_percentages(denominator = "col")
odor edible poisonous
almond 0.09505703 0.000000000
anise 0.09505703 0.000000000
creosote 0.00000000 0.049029622
fishy 0.00000000 0.147088866
foul 0.00000000 0.551583248
musty 0.00000000 0.009193054
none 0.80988593 0.030643514
pungent 0.00000000 0.065372829
spicy 0.00000000 0.147088866
The tabyl
– adorn
combo even enables us to easily combine both the number and percentage in a same table cell…
mush_tabyl %>% adorn_percentages %>% adorn_ns
odor edible poisonous
almond 1.0000000 (400) 0.00000000 (0)
anise 1.0000000 (400) 0.00000000 (0)
creosote 0.0000000 (0) 1.00000000 (192)
fishy 0.0000000 (0) 1.00000000 (576)
foul 0.0000000 (0) 1.00000000 (2160)
musty 0.0000000 (0) 1.00000000 (36)
none 0.9659864 (3408) 0.03401361 (120)
pungent 0.0000000 (0) 1.00000000 (256)
spicy 0.0000000 (0) 1.00000000 (576)
… or add the totals to the rows and columns.
mush_tabyl %>% adorn_totals(c("row", "col"))
odor edible poisonous Total
almond 400 0 400
anise 400 0 400
creosote 0 192 192
fishy 0 576 576
foul 0 2160 2160
musty 0 36 36
none 3408 120 3528
pungent 0 256 256
spicy 0 576 576
Total 4208 3916 8124
Conclusion
The tabyl()
function from the janitor
package in R offers a user-friendly and flexible solution for creating one-way, two-way, or three-way contingency tables. It excels in automatically computing proportions and producing tidy data frames that integrate seamlessly with the tidyverse ecosystem, especially dplyr
. Its outputs are well-structured and easy to interpret, and it can be further enhanced with adorn functions, simplifying the overall process of generating informative frequency tables. This makes tabyl()
a highly beneficial tool in data analysis in R.