
In their paper, _Tabular Data: Deep Learning is Not All You Need_, the authors argue that while deep learning methods have shown tremendous success in the image and text domains, traditional tree-based methods like XGBoost still continue to shine when it comes to tabular data. The authors examined Tabnet, Neural Oblivious Decision Ensembles (NODE), DNF-Net, and 1D-CNN deep learning models and compared their performance on eleven datasets with XGBoost.

This is an important paper in the sense that it reiterates the fact that deep learning may not be the silver bullet for solving all machine learning problems. On the other hand, tree-based algorithms have been shown to perform at par or even outperform neural networks for tabular data while being simple to use and comprehend.
And there is good news for people who like to work with tree-based models. A few months back, the TensorFlow Decision Forests, aka TF-DF library, was open-sourced by Google. In this article, we’ll understand what TF-DF is and how it could be helpful for us.
Objective
Many great resources and code examples have been made available as part of the documentation( refer to the References section below). Hence, I will not reinvent the wheel. This article is not a guide to get started but rather a quick overview of the library to showcase its main ideas and features. For an in-depth deep dive, the article by Eryk Lewinson on using TensorFlow Decision Forests using the Pokemon datasets is recommended.
TensorFlow Decision Forests (TF-DF)
Decision Forests(DF) is a class of machine learning algorithms made up of multiple decision trees. Random Forests and Gradient Boosted Decision Trees are the two most popular DF training algorithms. The TensorFlow Decision forests is a library created for training, serving, inferencing, and interpreting these Decision Forest models. TF-DF is basically a wrapper around the C++ Yggdrasil Decision Forests(YDF) library making it available in TensorFlow.
TF-DF provides a unified API for both tree-based models as well as neural networks. This is incredibly convenient for users since they can now use a unified API for the neural networks as well tree-based models.
Implementation overview
TF-DF library can be easily installed with pip. However, it is still not compatible with either Mac or Windows. For non-Linux users, using it via Google Colab could be a workaround.
Let’s look at the basic implementation example of using TF-DF on Palmer’s Penguins dataset. This dataset is a drop-in replacement for the iris dataset, and the goal is to predict the penguin species from the given features.


As you can see, the dataset is a mix of numerical and categorical features and is a classic example of a classification Machine Learning problem. Training decision forest in TensorFlow is very intuitive, as can be seen in the example below. The code has been taken from the official documentation.

A lot of things stand out. Notably, no preprocessing like one-hot encoding and normalization is required. We’ll touch upon them in the next section.
Applications
We have already seen a classification example. TF Decision Forests can also be used for tasks like regression and even ranking.


Highlights
TF-Decision Forests stands out on several fronts. Let’s briefly discuss a few of them:

Ease of Use
- The same Keras API can be used for neural networks as well as trees based algorithms. It is possible to combine decision forests and neural networks to create new types of hybrid models.
- No need to specify input features. TensorFlow Decision Forests can automatically detect the input features from the dataset.

Minimal Preprocessing
- No preprocessing like categorical encoding, normalization, and missing value imputations is required.
- No validation dataset is required. If provided, the validation dataset will only be used for displaying metrics.
Easy deployment options with TensorFlow Serving
- After the model is trained, you can evaluate it on a test dataset using
model.evalute()
or make predictions withmodel.predict()
. Finally, you can save the model in theSavedModel
format to be served just like any TensorFlow model using TensorFlow Serving.

Interpretability
- Sometimes, it becomes imperative that we understand how a model works under the hood for high-stakes decisions. The TensorFlow Decision Forests have inbuilt plotting methods to plot and help understand the tree structure.
Here is the plot of the first tree of our Random Forest model.
tfdf.model_plotter.plot_model_in_colab(model_1, tree_idx=0, max_depth=3)

Additionally, one can also access the model structure and feature importance along with the training logs.
Scope for Improvement
Many useful features come packaged with TF Decision Forests, but there are also some areas for improvement.
- No direct support for Windows or macOS(till date).
- As of now, only three algorithms are available in the TF DF module: Random Forests, Gradient Boosting Trees, and CART Model.

- Currently, there is also no support for running the models on GPU/TPU infrastructure.
Final word & Resources to get started
Overall, TF Decision Forests provide an excellent option for building tree-based models with TensorFlow and Keras. It is especially convenient for those who have an existing TensorFlow pipeline in place. This library is under constant development, so a lot more features should be expected soon. If you want to look at code examples and use the library for your use cases, there are some excellent resources for all levels.
- TensorFlow Decision Forests tutorials
- TensorFlow Decision Forests project on GitHub.
- Official Blogpost