Meta-analysis in R

Sub-group analysis and cumulative estimates

Published in

Towards Data Science

4 min readOct 13, 2021

This post is an extension to my previous introductory post on meta-analysis in R. Just to be clear from the start, sub-group analyses definitely have their rightful place when analyzing treatments effects but should never be abused. Too many examples exist showcasing the danger of p-hacking, and you should (as a reader) become very careful when a sub-group analysis was not included in the study protocol (meaning that the data sampled was not intended to be divided between groups). Nevertheless, should you have a solid (biological) reason to conduct sub-group analysis, the endeavor is surprisingly easy in R.

For conducting and showcasing a meta-analysis, only a few packages are needed. Actually, for this example, I relied solely on the metafor package.

Let’s load in the data, splitting it by outcome of interest and outcome type.

A glimpse of the tibble I loaded in R. The dataset contains the summary results of each study and each treatment. Remember that for conducting a meta-analysis on continuous outcomes you need the mean, standard deviation, and number of observations per treatment.

Create separate dataframes to make the analysis more easy. Of course, you also use the dplyr package or the built-in pipe to wrangle the data.

A subset containing a specific outcome and level: ADG and High.

The biggest part of the analysis is actually conducting the analysis. The code you need to use in the metafor package is very intuitive. Below, I requested a random-effects meta-analysis with an additional sub-group analysis based on the subADG&Environment variable. The hakn=TRUE statement requests the Hartung-Knapp adjustment.

The plot below shows the mean difference and 95% Confidence Interval between treatments for each study. A total of nine studies were included, containing an accumulated 233 observations. The random-effects model does not show an effect. One could be let to believe that this is easily spotted by just looking at the p-value, but I want you to forget about those values. If there ever was a case to look at confidence intervals, instead of p-values, you will find it in this meta-analysis. And what these confidence intervals contain is zero, meaning that there is no statistical effect.

Then, look at tau² and I². These are your metrics for heterogeneity. At the very least, any sub-group analysis should decrease those metrics hinting at an increased signal-to-noise ratio (the purpose of a subgroup analysis). However, by including subgroups you are also slicing your data which means a decrease in the effective sample size and most often also a variance increase.

If heterogeneity decreases this is a clear signal that splitting the data increased the signal-to-noise ratio.

Here, we indeed see this happening: from a global 40% heterogeneity we go down to 0% and 22%, respectively. Hence, the sub-group analysis seems to make sense here, but this of course not always the case. Below that, there is another p-value, but once again forget about it. Look at the confidence intervals instead.

Time to create those beautiful forest plots.

A lot of possibilities to customize your plots.

The plot clearly shows the within and between variance, the level of heterogeneity within and between groups, and the validity of the overall summary estimate. No need for p-values, just eye-ball.

If you want to use the power of a meta-analysis, you cannot walk past a cumulative meta-analysis. The process is easy — a new summary estimate is provided each time a study is ‘added’. This way you can track the change in estimates and see if perhaps too many studies have been conducted, how many more studies need to be conducted, or if conducting more studies will ever make a difference. Once again, don’t look at p-values, look at changes in treatment estimates and summary confidence intervals.

Adding more studies does not slim down the confidence intervals — you are adding just as much noise as you are adding signal.

Below, you see the workflow from above but now applied on event data which will translate in proportions. For this example, I will focus on mortality.

The outcome measure is no longer the mean difference, but the odds ratio. Also here, sub-group analysis deletes heterogeneity plus clearly showcases the origin of the heterogeneity.

Below, you will see me comparing two different methods for analyzing event data in a meta-analysis.

I wanted to compare two meta-analysis packages — metafor and meta.

The results between the packages overlap. The only difference is due to the missing option for the HK adjustment.

And now that it seems that we can trust the arithmetic, lets again build the forest plot.

The forest plot clearly shows how each outcome is estimated. To explain, an odds ratio (OR) of 0.37 is obtained by the formula: (6/616) / (16/616) which is 0.0097 / 0.0259 = 0.37

And the cumulative meta-analysis results. In this example, adding more studies equals adding more signal, which means no effect.

As you can see, conducting sub-group and cumulative meta-analysis is quite straightforward. What will never be straightforward are the choices you need to make to include a study in a meta-analysis and the criteria to use when deciding on a sub-group analysis.

But then again, research never is straightforward. Nor is it easy. That is excatly what makes it worthwhile :-)!

Meta-analysis in R

Sub-group analysis and cumulative estimates

Written by Dr. Marc Jacobs