Nowadays, data scientists have many accurate machine learning algorithms. But, the choice of the best models is a complicated task and ensemble learning has proved its efficiency in practice. In previous posts: "How to choose the best model?" and "How to deal with overlapping rules?" I have presented the experts’ aggregation theory. A theory that should be used more often for ensemble learning instead of simple averaging. Here, I want to focus on the COBRA method, presented in [1]. This method has a very different and original approach to the combination of estimators. For simplicity, I use the terms estimators, predictive models and experts without any distinction. Indeed, in a regression setting, an estimator of the regression function can be used as a predictive model, or it can be used as an expert that makes a prediction for each new observation.
First, I recall the main setup of the experts’ aggregation theory. We have a set of K experts that give us, at each time t, a prediction for the value of the target yₜ. The idea is to aggregate the predictions of the K experts to produce an aggregated prediction _ŷ_ₜ.
COBRA (COmBined Regression Alternative).
Explanation with the hands.
Usually, in the experts’ aggregation theory, we use a convex combination of the experts’ predictions to make ŷ. But COBRA has a very different approach. It is based on a similar idea as the _k-nearest neighbourhoo_d algorithm. At each time t we have a new observation _x_ₜ, and we compute the K experts’ prediction_s {p₁(xₜ), p₂(xₜ), …, pₖ(xₜ_)}. Then, the idea is to average realizations of y, not us_e_d to generate the experts, that have predictions in the same neighbourhood (in the Euclidean sense) o_f {p₁(xₜ), p₂(xₜ), …, pₖ(xₜ_)}. The step of searching for realizations in these neighbourhoods is called th_e consensus st_ep. The following example will serve to illustrate the concept.

In this example, we have one feature x ∈ R represented in abscissa. The y‘s realizations are marked as black circles. We have two experts: The first expert gives the red predictions and the second the green predictions. For the new observation x = 11, we have the predictions p¹ₜ and p²ₜ. For each prediction a neighbourhood is formed, symbolized by the coloured dashed line. Then, all realizations that have all predictions in the neighbourhoods, marked as the blue circles, are averaged to compute ŷₜ (the blue rhombus).
Mathematical explanations.
Formally, the COBRA estimator is made as the following. Let Dₙ be a sample of n independent and identically distributed observations of the pair of random variable (X, Y). The sample is divided into two independent samples, Dₗ and _D_ₘ. Then, Dₗ is used to generate a set of experts _{p_₁, _p_₂, …, pₖ} and _D_ₘ is used for the calculation of ŷₜ, __ the combined predicted value for a new observation _x_ₜ. We have the following formulas

where the random weights Wᵢ take the form

The ϵₘ is the smoothing parameter. The larger ϵₘ, the more tolerant the process. Conversely, if ϵₘ is too small, many experts are discarded. Therefore, its calibration is a crucial step. To overcome this step, the authors proposed a data-dependent calibration in the third section of [1].
These literal expressions show that one of the main differences between COBRA and the other common aggregation methods is that COBRA is a nonlinear method with respect to the experts _{p_₁, _p_₂, …, pₖ}. Otherwise, from a theoretical point of view, COBRA also satisfies an oracle bound, which shows that the cumulative loss of the aggregate predictor is upper bounded by the smallest cumulative loss of the expert group, up to a residual term that decays towards zero.
Pycobra library
Pycobra is an open-source library for Python that was introduced in [2]. This library is more than just an implementation of COBRA aggregation. Even if, the simple fact of developing an algorithm called COBRA in a language called Python had been enough. This library also included the EWA algorithm (Exponentially Weighed Aggregation) detailed in [3], and a version of COBRA for classification setting, ClassifierCobra, inspired by [4]. This package also included some visualization tools to gauge the performance of the experts. Moreover, a class Diagnostics allows comparing different combinations of the constituent experts and data-splitting, among other basic parameters. It allows for better parameters’ analysis. Finally, the library is available on GitHub here.
Conclusion
COBRA is an original nonlinear aggregation method for ensemble learning with theoretical guarantee. The main authors of the first paper are still working on developing a better version of it, as shown by the recent paper introducing a kernel version [5]. What’s more, this algorithm is available in an open-source Python library, so there’s no excuse not to try it out in your next data science project or Kaggle challenge.
About Us
Advestis is a European Contract Research Organization (CRO) with a deep understanding and practice of statistics, and interpretable machine learning techniques. The expertise of Advestis covers the modeling of complex systems and predictive analysis for temporal phenomena.
LinkedIn: https://www.linkedin.com/company/advestis/
References
[1] G.Biau, A.Fischer, B.Guedj & J.D.Malley COBRA: A combined regression strategy. Journal of Multivariate Analysis 146 (2016): 18–28.
[2] B.Guedj, and B.Srinivasa Desikan Pycobra: A python toolbox for ensemble learning and visualisation. Journal of Machine Learning Research 18.190 (2018): 1–5.
[3] A. S.Dalalyan and A.B.Tsybakov Aggregation by exponential weighting and sharp oracle inequalities. International Conference on Computational Learning Theory (2007): 97–111.
[4] M.Mojirsheibani Combining classifiers via discretization. Journal of the American Statistical Association _94._446 (1999): 600–609.
[5] B.Guedj and B.S.Desikan. Kernel-Based Ensemble Learning in Python. Information 11, no. 2 (2020): 63.