The world’s leading publication for data science, AI, and ML professionals.

Improving The Inference Speed of TabNet

TabNet[1] is a deep neural network (dnn) based model for tabular data sets. The authors of TabNet claim that dnn has been successful for…

Tips and Tricks

A simple way to speed up TabNet’s inference by modifying a single line of code.

TabNet[1] is a deep neural network (dnn) based model for tabular data sets. The authors of TabNet claim that dnn has been successful for image data, sequential data (e.g., texts), but when it comes to tabular data, it performs worse than gradient boosting models such as LGBM or XGBM. The authors attempt to tackle this issue with dnn. They can develop a novel dnn based model for tabular data sets and show that TabNet performs significantly better than gradient boosting models. Further, TabNet offers its superior performance in one of Kaggle competitions – Mechanisms of Action prediction[2].

Despite TabNet’s performance, it has one weakness – slow inference speed. I will first explain why inference speed is slow (if you are not interested, you can jump to the next next paragraph). TabNet is slow because of its feature selection step. TabNet selects its feature using "sparsemax"[3], which is defined as, for vector $z$,

sparsemax_i(z) = [z_i - tau(z)]_+ 

Sparsemax outputs sparse probability, which is useful when it comes to feature selection. For instance, suppose we have 3 features (A, B, and C), and only A is relevant for prediction. In this case, sparsemax assigns probability 1 on A and probability 0 on B and C to work with the suitable feature only. We do not see this tendency from softmax.

Now, let’s talk about why sparsemax slows TabNet. The slowness comes from $tau(z)$ from the function above. To obtain $tau(z)$, it involves sorting and searching maximum whose time complexities are O(n*log(n)) and O(n), respectively, where $n$ is the length of vector $z$.

Then, how can we fix this? I suggest one use softmax instead of sparsemax, but with multiplier $m$.

softmax_i(z,m) = exp(m*z_i)/(exp(m*z_1)+...+exp(m*z_n))
Sparsemax V softmax: From left to right, multipliers are set 3, 1, and 5, respectively. Length of z is two.
Sparsemax V softmax: From left to right, multipliers are set 3, 1, and 5, respectively. Length of z is two.

The above figures show comparisons between sparsemax and softmax. As claimed by the author of sparsemax[3], sparsemax can perform better than softmax with multiplier one in creating sparse probability. Yet, As we increase multiplier magnitude, softmax approximates sparsemax reasonably well, especially when multiplier equals 3. Furthermore, it is much faster to calculate softmax than sparsemax. Thus we can expect fast inference speed!

Performance comparison: I applied this modified version of TabNet to Kaggle competition, Jane Street Market Prediction[4]. In that competition, one needs to decide whether to accept financial trading opportunity within 16 milliseconds, and the given data are tabular. Although it is a financial trading decision problem, I do not think it can be formulated into time series as most parts of the data are anonymized: anonymized features, anonymized security type, anonymized market, and so on. Anyway, I was able to get 7990.420 scores while other people get ~5000 scores. In the competition, most people stop using TabNet due to its slow inference time, while I utilize fast inference from softmax.

In conclusion, I suggest using softmax with a multiplier to improve the inference speed of TabNet. Although I have not applied it to many other data sets, it does not seem to deteriorate performance from using softmax with multiplier; it sometimes improves performance.

Please do let me know if you find any errors or how to type math equations in medium. Thank you for reading my article.

Reference

[1] Arik, S.O. and Pfister, T., 2019. Tabnet: Attentive interpretable tabular learning. arXiv preprint arXiv:1908.07442.

[2] Kaggle. 2020. Mechanisms of Action (MoA) Prediction. https://www.kaggle.com/c/lish-moa

[3] Martins, A. and Astudillo, R., 2016, June. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning (pp. 1614–1623). PMLR.

[4] Kaggle. 2021. Jane Street Market Prediction. https://www.kaggle.com/c/jane-street-market-prediction


Related Articles