The world’s leading publication for data science, AI, and ML professionals.

Create Flawless Tables from your Dataframe ready for Publication

A Step-by-Step Tutorial How to Create Publication-Ready Tables

Photo by Jeremy Zero on Unsplash
Photo by Jeremy Zero on Unsplash

TUTORIAL – TABLES – R

I am a Data Scientist, and most of the time, I think about a perfect way to visualize a vast amount of data to convey interesting findings to clients and team members. And to be honest, in most cases, if not in every case, showing the data and its structure in the form of a simple table is necessary and will help to improve the overall understanding.

However, in most cases, I use PowerPoint or Excel to create this table to look presentable and/ or publishable. This, of course, breaks the possibility to automatically reproduce this result. For one of my latest projects, I learned about and applied a package that allowed me to create beautiful and publication-ready data tables without leaving my Data Science platform.

1 Introduction

In this article, I will show you how to use the Grammar of Tables (gt) package to create flawless and publication-ready tables, turn your settings into a theme for quick reuse, and how to apply this theme in your next data science project.

Example table with S&P 500 data (image by the author)
Example table with S&P 500 data (image by the author)

2 Setup

Most of my client work involves Python and Pandas. However, by training, I am an R person. I worked this out for the R data science platform, but I will investigate how this can be achieved using Python and Pandas for one of my upcoming articles.

Nevertheless, here is the list of software and packages that I am using:

  1. R & RStudio – the data science platform and IDE of voice.
  2. tidyverse package – this package allows me to write elegant, readable, and efficient code to manipulate data frames
  3. gt package – the Grammar of Tables (gt) package to create flawless table designs
  4. Gapminder package – excerpt of the Gapminder data on life expectancy, GDP per capita, and population by country

2.1 Brief overview of Grammar of Tables (gt)

The gt package follows a descriptive approach to create tables, such as the Grammar of Graphics. In short, it allows us to specify what should happen and not to specify how it should happen – such a great and readable way to write code.

Parts of a gt table (Image source: https://gt.rstudio.com)
Parts of a gt table (Image source: https://gt.rstudio.com)

The gt package defines an exhaustive number of areas to add to your table and manipulate their visualization. In the examples below, I will explain to you how to use these areas.

It is also important to note that you may create a table for your R notebook and save the table in several formats, including HTML and PNG, which is helpful if you need to report your tables in different publications, i.e., a website or PowerPoint document.

2.2 Packages and Constants

Before I start with creating a table, I share this code that will load – and, if necessary, install – required packages:

I also use some constants that helped me to write a flexible R script. Please note that I use c_rn to specify the maximal row numbers to include in the table, c_save to determine whether to save every step of the table creation process as a file (which takes a little bit of time) and c_format to specify the output format.

The general output of the Gapminder data set looks like this:

Standard console output (image by the author)
Standard console output (image by the author)

3 Create a Flawless and Publication-ready Table

The most basic use of the gt package is just to pass the filtered data frame to the gt function. This is not too exciting, though, or adds any benefit to the standard console output.

Standard gt output (image by the author)
Standard gt output (image by the author)

3.1 Adding a Grouping Column

This might not be relevant to the majority of your data frames. However, I would like to show how this works and will help to understand your table better. To do so, I pass the column continent as the grouping column and specify the country column as the row label column.

Adding a grouping column (image by the author)
Adding a grouping column (image by the author)

3.2 Adding Summary Rows

The following code allows us to add summary rows. What the summary might contain is up to you and worthwhile for your audience. I decided to add the functions sum, average, and standard deviation. Although not all summary functions make sense for this data set, I would like to show how to implement them.

Adding Summary Rows (image by the author)
Adding Summary Rows (image by the author)

3.3 Changing the Label for each Column

I believe you experienced this also in your projects. How to label your columns? Your data frame often uses technical (i.e., short and blank space-free names), yet there are functional names that are meaningful for your audience. An example in the Gapminder data set would be the column lifeExp that really stands for "Life Expectancy". The gt package allows us to change the labels for the resulting table without changing it in your dataset.

Changing label names (image by the author)
Changing label names (image by the author)

3.4 Formatting Columns

Formatting columns include several things. In this example, I tell the package to differentiate between number and currency columns, their alignment, and how much space (in px) they should have. The function opt_row_striping() creates banded rows that improve the table’s readability.

Formatting Columns (image by the author)
Formatting Columns (image by the author)

3.5 Adding Titles, Footnotes, and Sources

If you plan to have all relevant meta-information as part of the table layout, the gt package will help you. Especially the possibilities for the footnotes are beneficial because you might apply a function to it. In the following example, a footnote will be added to the country with the lowest population. Please note that it is possible to use markup to modify the text layout using the md function.

Adding Titles, Footnotes, and Sources (image by the author)
Adding Titles, Footnotes, and Sources (image by the author)

3.6 Applying Formatting to the table

This is a rather long one, but I hope the code explains what might be happening here. Please note, that I use the functions tab_options() as well as tab_style(). tab_options looks like to manipulate general settings, while tab_style is used for more specific locations. Please share your ideas to simplify the following code. Very much appreciated.

Applying Formatting to the table (image by the author)
Applying Formatting to the table (image by the author)

3.7 Applying Conditional Cell Coloring

Another helpful feature of the gt package is the ability to color cells based on values. In the following example, I will make use of it in two different ways. First, I would like to apply blue shading for the column "Life Expectancy" in the first one. For this, I will use the color palette with the name c_col, which was specified as one of the constants in the beginning.

I would like to color the row with the minimum population in the color blue in the second way.

Applying Conditional Cell Coloring (image by the author)
Applying Conditional Cell Coloring (image by the author)

4 Creating a Reusable gt Theme

To create a theme, I needed to understand to differentiate between settings related to the look and associated with the data-specific columns (that will change with every data set). To showcase this, I will use the data set "Daily S&P 500 Index data" that is part of the gt package.

4.1 Create a gt Table from the S&P Data Set

Standard gt table output (image by the author)
Standard gt table output (image by the author)

4.2 Create a Theme

I created a function my_theme() that may then quickly applied to any of your gt tables.

Themed gt table output (image by the author)
Themed gt table output (image by the author)

Please note that this theme is built with my limited knowledge. So please share ideas on how to improve and simplify this code. Very much appreciated.

4.3 Apply Column-specific formats

The remaining step is to format the S&P-specific columns. Firstly, I specify columns with a currency format.

Formatted columns (image by the author)
Formatted columns (image by the author)

Lastly, I add summary rows to the table, including mean and standard deviation.

Added Summary Rows (image by the author)
Added Summary Rows (image by the author)

5 Conclusion

In this article, I introduced you to the gt package. Then, I showed you how to format a table, include summary rows, and apply conditional cell formatting. Further, I explained how to create an individual theme, which you might reuse for every data project.

Please note that I only scratched the surface of the gt package. Also, my knowledge might be limited, and there are better ways to achieve these results with less code. If you are interested, please reach out to other tutorials to further educate yourself. I am happy to share great tutorials with the following list:

What do you think? Do you have any feedback for me?

Please feel free to contact me with any questions and comments. Thank you.


Related Articles