This post is part 4 of a series of 4 publications[Project repo]:
- Refer to part 1 for an overview of the series,
- part 2 for an explanation of the data sources and minor data cleaning,
- part 3 for the creation of the visualisations, building the report and the deploy the document into ShinyApps.io and
- part 4 for automatic data update, compilation and publishing of the report.
Each article in the series is self-contained, meaning that you don’t need to read the whole series to make the most out of it.
Table of Content
· Introduction · Pipeline building blocks – Steps ∘ Structure of a Github Action .yml file ∘ Setup repo, R and pandoc ∘ Install dependencies ∘ Report steps – update data, run scripts, publish ∘ Publishing and secret tokens · When to run the pipeline – Trigger ∘ At event ∘ On schedule – using cron · Making your pipeline even better – Extras ∘ Cache dependencies ∘ Committing back the latest changes to your repo
Introduction
If you have never used GitHub Actions you are in for a treat. Github Actions is a fairly recent feature from GitHub which allows you to automate workflows, such as testing code or package deployment. GitHub actions are a powerful tool for continuous integration and continuous development, which enables you to outsource the testing of your code to isolated machines and to various operating systems that you may not have at hand.
In this article, I will show you how to use GH Actions for something a bit less usual than code testing. GH Actions provide us with an isolated environment where we can run any code, hence they are perfect for a report publishing pipeline like the one we are going to build here. In the previous articles of this series, I built an interactive COVID-19 report using Shiny that you can visit here. Building this project, I split the code into 3 stages. First, the most recent COVID data gets downloaded from Our World in Data and cleaned. Next, I wrote several scripts to make a variety of plots (treemaps, line plots and maps). Finally, I wrote a deployable Shiny RMarkdown document and uploaded it through the RStudio IDE or from the command line. Because this workflow was already designed in a step-wise manner, the steps for automating it are clear. Here, I will showcase the basics of GitHub Actions and how I use them in this particular instance to update my COVID report without even noticing!
Pipeline building blocks – Steps
Structure of a Github Action .yml file
The structure of a GitHub Action YAML file is very intuitive. First, it needs a name, one or several jobs to run and a set of instruction on when to run the job (the on section – e.g. when there’s a push to master). Each job is usually broken down into steps, for which we can either use a pre-made GitHub Action (like those available in the R-lib/actions or the GitHub actions repositories) or run bash commands as we would from the command line. We also need to specify the type of machine we wish to run our jobs in (macOS, Ubuntu, Windows…). That leaves us with a backbone that will look something like this:
name: Name-of-our-Amazing-Action
on: when-to-run-action
jobs:
Name-of-Action1:
runs-on: OS-to-run-our-job-in
steps:
# this is a comment by the way, it won't hurt
- name: step1
uses: repo/of/someones/actions
with:
parameter1: some-value
parameter2: some-other-value
- name: dummy-step2
run: pwd
shell: bash # <-- default, can omit
- name: step3 # multi-line bash
run: |
cd some-directory/
python my_script.py
Now that we know the basics, let’s learn by doing!
Setup repo, R and pandoc
I write code in a MacBook, so I will use macOS as my default OS to avoid potential compatibility problems. Furthermore, I will be using a single R version (R 4.0.2 – the one I have installed in my computer) for the same reason.
I want everything to run like it would in my computer but without actually running it on my computer
If you are planning to use GHA for testing a package, using several OS and several R versions is recommended to ensure compatibility of your code with whatever setup the users of your package may be running. Our jobs section thus starts like this:
Note: Having a "strategy matrix" allows you to set up a grid of parameters for all your actions (e.g. r-version: [3.5.1, 3.6.2, 4.0.2]).
The first step you will see in most GHA YAML files is a call to the checkout action written by the GitHub teams themselves. This action clones your repository from the respective branch that has triggered it:
Next, we need to set-up an R installation in our GHA machine to run our scripts. We will also need pandoc
(the multi-language rendering engine that powers RMarkdown) to compile our Shiny RMarkdown report. We are going to be using actions from the core R team for this, hence we know they are reliable.
As you can see the uses: section is simply specifying the path to a GitHub profile (r-lib), a repo within that profile (r-lib/actions) and a specific action and its version under that repo (r-lib/actions/setup-pandoc@v1). Also, it is worth mentioning the syntax used to specify the r-version from the settings we specified earlier. When we use ${{ }}
we can refer to an object we have defined within the YAML file, in this particular instance the R version from the matrix we set. Had we specified several R versions (let’s say 3), the GHA would automatically launch 3 independent jobs and run the rest of the steps in parallel in their respective R versions. Pretty awesome if you ask me!
Install dependencies
Now that we have our fundamental set up, we need to install the specific tools for our specific task, i.e. R packages. We do this the same way you would do it from your computer:
Note here we have written the run: section in R and specified to run this in a Rscript shell. If you don’t like that flavour you could write it up like so:
That simple! In my experience, this step took the longest as it had to install a lot of packages. Though there is a way to store the dependencies that I will cover later for "pro" users 🙂
Report steps – update data, run scripts, publish
Now we just need to run the R scripts that we previously made. The first R script that we wrote last time, downloads the updated data from Our World in Data and the other ones make plots out of that data. Here having a reliable data source (one that does not change often) and good code will avoid having to troubleshoot our GHA pipeline in the long run. As promised we just need to run our R scripts:
Publishing and secret tokens
To be able to publish our Shiny document (could be a Shiny App too), the Shiny server needs to verify we are legitimate. A token and a secret are used to verify the identity of whoever is trying to upload a document to the server. In my previous post, I broke down the steps to generate these keys. I am going to create a new set of keys but you could also re-use the keys you use for your local R or RStudio. After creating the keys, you should be shown something like so:
rsconnect::setAccountInfo(name='your-username',
token='your-token',
secret='your-secret')
One, of course, would not want to simply add this R code within your YAML file. That would expose your private keys and allow anyone that finds these to change your documents and apps hosted in your shiny server. **** We are going to use GitHub Secrets 🔒 , which allows users to store keys and passwords for an individual repo or for an entire organization in a secure way.

Next, our actions YAML section will look like this:
This method of passing secrets to scripts is the one endorsed by Github[ref]. I was initially worried about this being a potentially vulnerable way of passing the private keys to our actions file. I thought someone could print this out with an echo
command and reveal the keys in the logs, but GitHub has some way to prevent that behaviour luckily (see below)

Finally, we need to run a single line of R code to deploy our updated report, just like we would from our computer:
When to run the pipeline – Trigger
At event
You may not want to trigger an action every time you commit a change, or may not want to trigger actions if you are working in a branch. You can specify what triggers your actions in the on: section of your YAML file. A common on: section looks like this:
Note that if you are using the newest GH repos your master branch will probably be called main. The section above will trigger our action whenever we push new changes to master or submit a pull request to the master branch. You can read more about what events trigger an action in the GitHub docs.
On schedule – using cron
Being able to validate your workflows every time you submit changes to your master branch is a must-have for whatever codebase you are working on. Sometimes though, like in our case, you may have a workflow that you wish to run recurrently even if no changes have been performed in your code. GitHub Actions support cron jobs, which do exactly that! Our updated on: section now looks like:
The above section specifies to run our action "At 12:00 on every 3rd day-of-month.". To me, this is the most useful feature in the entire action file. Every 3 days, the updated data from Our World in Data will be pulled, the scripts we wrote will be re-run and our interactive report will be updated without us even noticing! – Freeing time through Automation 🕊 🙌
I would highly recommend using the *[crontab.guru](https://crontab.guru/#0_12_/3**) or cronmaker to define and understand your cron** job schedules.
Making your pipeline even better – Extras
Cache dependencies
Certain steps of your workflows may take a long time to be computed but may actually not change much over time. Package 📦 dependencies are a perfect example of this. Downloading and installing several packages over a network connection every single time we run our action is a waste of time and resources (remember that the cloud ☁️ still burns fuel 🔥 ). For this, Github Actions offers a caching tool (https://github.com/actions/cache). The cache allows us to store the state of a given file or directory (in our case the R libraries directory) so that it can simply be loaded from memory the next time we need it.
To cache our dependencies we simply need to add the following blocks in our YAML:
The block above will store the state of the directory specified by path:
(line 5), with the key value of macOS-shinydoc
. The restore-keys:
argument is optional, it provides a list of key pattern to search for if an exact match is not found. Additionally, we need to add an if:
section to our "Install dependencies" step to ensure that dependencies are re-installed only if a cache-hit
does not occur.
Our cached files will be erased after one week of inactivity. You can read more about cache in the GH docs.
Committing back the latest changes to your repo
Finally, you may wish to commit your latest changes after the action is completed. In my case, I don’t wish to do so as the only files that will be modified will be the data (csv
and rds
files) and the plots (png
images), though I would understand if other people would like to have this option. You can accomplish this using the push-action (https://github.com/ad-m/github-push-action). An example action file is provided in the repo’s README.
As always, I hope you have learned something useful with this walkthrough 😄 . I personally find it mind-boggling that such a powerful tool is available for free and I also like the fact that the community can contribute by providing custom-made high-quality actions! You can find the complete actions file I use in my project here.
If you are interested in learning more about GitHub Actions or have any problems with your workflows please feel free to post a comment or contact me directly via Twitter @lucha