
In this post, you’ll learn how to call R methods from Python using the library rpy2.
We’ll cover an example related to forecasting. We’ll define and run R functions that combine forecasts made by Python-based models.
Introduction
Even if Python is your preferred language, R can still be useful sometimes.
I don’t want to get into a Python vs R debate. Nowadays I mostly use Python. But, many great methods are only available in R. And it’s too much of a nuisance to implement them from scratch.
The library rpy2 got us covered. It allows you to run R code within Python. R data structures such as matrix or data.frame are converted to numpy or pandas objects. It’s also easy to integrate custom R functions into your Python workflow.
So, how does rpy2 work?
Example using Opera
We’ll focus on using the R package opera. You can use this package for combining forecasts.
Before diving into rpy2, let’s go over the problem we’re solving.
Primer on Forecasting Ensembles
Ensembles improve forecasting performance by combining many different models.
Most often, the combination is done using a simple average. Each model in the ensemble has equal importance in the final prediction. But, a better way to combine forecasts is to use dynamic weights. Thus, the weights of each model adapt to changes in the time series.
Opera

There are many methods for dynamic forecast combinations. You can check a previous article for a list of different approaches.
What’s so special about opera?
Opera stands for Online Prediction by Expert Aggregation. Some of the best methods for forecast combination are only available in this R package. They contain interesting theoretical properties on forecast combination worst-case scenarios. These can be valuable for developing robust forecasting models.
You can find a full example of how opera works here.
In the rest of this article, we’ll use opera to combine forecasts made by Python models.
Case Study
Like in the previous article, we’ll resort to the energy demand time series as a case study.
This example includes three steps:
- Building the ensemble;
- Creating the R function we need to run;
- Using these functions for dynamic forecast combination.
Let’s dive into each of these steps in turn.
Building the Ensemble
First, we build an ensemble using Python’s scikit-learn methods.
Here’s how you can do this:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Lasso, Ridge, ElasticNetCV
from pmdarima.datasets import load_taylor
# src module available here: https://github.com/vcerqueira/blog
from src.tde import time_delay_embedding
series = load_taylor(as_series=True)
series.index = pd.date_range(end=pd.Timestamp(day=27, month=8, year=2000), periods=len(series), freq='30min')
series.name = 'Series'
series.index.name = 'Index'
# train test split
train, test = train_test_split(series, test_size=0.1, shuffle=False)
# ts for supervised learning
train_df = time_delay_embedding(train, n_lags=10, horizon=1).dropna()
test_df = time_delay_embedding(test, n_lags=10, horizon=1).dropna()
# creating the predictors and target variables
X_train, y_train = train_df.drop('Series(t+1)', axis=1), train_df['Series(t+1)']
X_test, y_test = test_df.drop('Series(t+1)', axis=1), test_df['Series(t+1)']
# defining four models composing the ensemble
models = {
'RF': RandomForestRegressor(),
'KNN': KNeighborsRegressor(),
'LASSO': Lasso(),
'EN': ElasticNetCV(),
'Ridge': Ridge(),
}
# training and getting predictions
test_forecasts = {}
for k in models:
models[k].fit(X_train, y_train)
test_forecasts[k] = models[k].predict(X_test)
# predictions as pandas dataframe
forecasts_df = pd.DataFrame(test_forecasts, index=y_test.index)
We created five models: a Random Forest, a K-Nearest Neighbor, and three linear models (Ridge, LASSO, and ElasticNet). These are trained in an auto-regressive way.
Here’s a sample of their forecasts:

Now, let’s use R’s opera to combine these forecasts using rpy2. We’ll cover two useful things about this library:
- how to define and use an R function in Python;
- how to convert data structures across the two languages.
Defining R Functions in Python
You can define an R function in a Python multi-line string:
import rpy2.robjects as ro
# polynomially weighted average
method = 'MLpol'
# defining the R function in a Python multi-line string
ro.r(
"""
define_mixture_r <-
function(model) {
library(opera)
opera_model <- mixture(model = model, loss.type = 'square')
return(opera_model)
}
"""
)
# storing the function in the global environment
define_mixture_func = ro.globalenv['define_mixture_r']
# using the function
opera_model = define_mixture_func(method)
The string that contains the function is passed to rpy2.robjects module. Then, the method globalenv makes it available to use in Python.
You can define any function you’d like. Note that R, and any required R packages, need to be installed in your system for this to work.
About the function in the example above. It is used to create an opera object (called mixture). The required parameter is the method that is used for combining forecasts. We use MLpol, which is based on a polynomially weighted average.
Here are a few other useful alternatives:
- EWA: Exponentially weighted average;
- OGD: Online gradient descent;
- FTRL: Follow the regularized leader;
- Ridge: Online Ridge regression.
Converting data from pandas to R, and vice-versa
Here’s another function we need:
from rpy2.robjects import pandas2ri
ro.r(
"""
update_mixture_r <-
function(opera_model, predictions,trues) {
library(opera)
for (i in 1:length(trues)) {
opera_model <- predict(opera_model, newexperts = predictions[i, ], newY = trues[i])
}
return(opera_model)
}
"""
)
update_mixture_func = ro.globalenv['update_mixture_r']
# activating automatic data conversions
pandas2ri.activate()
# using the function above
## predictions is a pandas DataFrame and trues is a pandas Series
## opera_model is a rpy2 object that represents a R data structure
new_opera_model = update_mixture_func(opera_model, predictions, trues)
# deactivating automatic data conversions
pandas2ri.deactivate()
The function definition is like before. But, this function requires extra inputs besides _opera_model (_which we defined above). We need to pass an R data.frame (predictions) and a vector (trues) as input.
You can use _pandas2ri_ to convert data structures between Python and R. This way, you pass a pd.DataFrame (predictions) and a pd.Series (trues). rpy2 converts them automatically. After the function is applied, rpy2 converts the results back to Python data structures.
Putting it all together
Finally, let’s go back to our case study.
I wrapped the functions above in a Python class called Opera. You can check its code on my Github.
Here’s how to use it:
# https://github.com/vcerqueira/blog/blob/main/src/ensembles/opera_r.py
from src.ensembles.opera_r import Opera
opera = Opera('MLpol')
opera.compute_weights(forecasts_df, y_test)
ensemble = (opera.weights.values * forecasts_df).sum(axis=1)
Here’s how the weights assigned with each model are distributed:

These weights change over time to cope with the time series dynamics:

Key Takeaways
This article touches on two topics:
- Using the rpy2 library to run R code within Python;
- Doing dynamic forecast combination using the opera R package.
We used rpy2 to define and run several R functions within Python. We focused on a specific package called opera. Yet, you can define and run any function you’d like.
There’s a lot more to rpy2. Here’s the link to the documentation:
The opera package is useful for dynamic forecast combinations. Its methods are efficient and provide valuable theoretical guarantees of forecasting performance.
Thanks for reading, and see you in the next story!
Related Articles
Further Readings
[1] rpy2 Documentation: https://rpy2.github.io/doc/v3.5.x/html/
[2] Opera Documentation: https://cran.r-project.org/web/packages/opera/vignettes/opera-vignette.html