Meta-analysis in R
Sub-group analysis and cumulative estimates
This post is an extension to my previous introductory post on meta-analysis in R. Just to be clear from the start, sub-group analyses definitely have their rightful place when analyzing treatments effects but should never be abused. Too many examples exist showcasing the danger of p-hacking, and you should (as a reader) become very careful when a sub-group analysis was not included in the study protocol (meaning that the data sampled was not intended to be divided between groups). Nevertheless, should you have a solid (biological) reason to conduct sub-group analysis, the endeavor is surprisingly easy in R.
The biggest part of the analysis is actually conducting the analysis. The code you need to use in the metafor package is very intuitive. Below, I requested a random-effects meta-analysis with an additional sub-group analysis based on the subADG&Environment variable. The hakn=TRUE statement requests the Hartung-Knapp adjustment.
The plot below shows the mean difference and 95% Confidence Interval between treatments for each study. A total of nine studies were included, containing an accumulated 233 observations. The random-effects model does not show an effect. One could be let to believe that this is easily spotted by just looking at the p-value, but I want you to forget about those values. If there ever was a case to look at confidence intervals, instead of p-values, you will find it in this meta-analysis. And what these confidence intervals contain is zero, meaning that there is no statistical effect.
Then, look at tau² and I². These are your metrics for heterogeneity. At the very least, any sub-group analysis should decrease those metrics hinting at an increased signal-to-noise ratio (the purpose of a subgroup analysis). However, by including subgroups you are also slicing your data which means a decrease in the effective sample size and most often also a variance increase.
If heterogeneity decreases this is a clear signal that splitting the data increased the signal-to-noise ratio.
Here, we indeed see this happening: from a global 40% heterogeneity we go down to 0% and 22%, respectively. Hence, the sub-group analysis seems to make sense here, but this of course not always the case. Below that, there is another p-value, but once again forget about it. Look at the confidence intervals instead.
Time to create those beautiful forest plots.
If you want to use the power of a meta-analysis, you cannot walk past a cumulative meta-analysis. The process is easy — a new summary estimate is provided each time a study is ‘added’. This way you can track the change in estimates and see if perhaps too many studies have been conducted, how many more studies need to be conducted, or if conducting more studies will ever make a difference. Once again, don’t look at p-values, look at changes in treatment estimates and summary confidence intervals.
Below, you see the workflow from above but now applied on event data which will translate in proportions. For this example, I will focus on mortality.
Below, you will see me comparing two different methods for analyzing event data in a meta-analysis.
And now that it seems that we can trust the arithmetic, lets again build the forest plot.
As you can see, conducting sub-group and cumulative meta-analysis is quite straightforward. What will never be straightforward are the choices you need to make to include a study in a meta-analysis and the criteria to use when deciding on a sub-group analysis.
But then again, research never is straightforward. Nor is it easy. That is excatly what makes it worthwhile :-)!