Leveling up Training: NVTabular and PyTorch Lightning

Training a wide and deep recommender model on MovieLens 25M

Published in

Towards Data Science

4 min readJun 17, 2022

NVTabular is a feature engineering framework designed to work with NVIDIA Merlin. It can process large datasets typical in production recommender setups. I tried to work with NVIDIA Merlin on free instances, but the recommended approach seems to be the only way forward. But I still wanted to use NVTabular since the value of using the GPU for data engineering and data loading is very attractive. In this post, I’m going to use NVTabular with PyTorch Lightning to train a wide and deep recommender model on MovieLens 25M.

We have a hybrid implementation here. Image by Laura Musikanski on Pexels.com.

For the code, you may check my Kaggle notebook. We have a lot of components for this implementation as listed below:

Large chunks of the code are lifted from the NVTabular tutorial.
For the model, I am using some parts of James Le’s work.
Optuna is used for hyperparameter tuning.
Training is done via PyTorch Lightning.
I am also leveraging CometML for metric tracking.
The dataset I used is MovieLens 25M. It has 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users [1].

Cool? Let’s start!

Data Processing with NVTabular

There are several advantages to using NVTabular. You can use datasets that are larger than memory (it uses dask), and all processing can be done in the GPU. Also, the framework uses DAGs, which are conceptually familiar to most engineers. Our operations will be defined using these DAGs.

We will first define our workflow. First, we’re going to use implicit ratings where 1 is a rating of 4 and 5. Second, we’ll be converting the genres column into a multi-hot categorical feature. Third, we’ll be joining the ratings and the genres tables. Note that the >> is overloaded and behaves just like a pipe. If you run this cell, a DAG will appear.

You may find the >> operations on lists strange and may take some getting used to. Note also that the actual datasets aren’t defined yet. We will need to define a Dataset, which we will transform using the above Workflow. A Dataset is an abstraction to use chunks of the dataset under the hood. The Workflow will then compute statistics and other information from the Dataset.

If you run the above snippet, then you will have two resulting output directories. The first, train, will contain parquet files, the schema, and other metadata about your dataset. The second, workflow, will contain the computed statistics, categoricals, etc.

To use the datasets and workflows in training the model, you will use iterators and data loaders. It looks like the following.

Wide and Deep Networks with NVTabular, TorchFM, and PyTorch

The model we’re using is a wide and deep network which was first used in Google Play [2]. The wide features are the user and item embeddings. For the deep features, we pass the users, items, and item feature embeddings to successively fully connected layers. I’m modifying the genres variable to use multi-hot encodings, which if you look under the hood, is summing together embeddings of the individual categorical values.

See the following image for a visual representation from the original authors.

Image by Heng-Tze Cheng et al. Wide & Deep Learning for Recommender Systems

The constructor below is truncated for brevity.

Training Loop

To train our model, we define a single training step. This is required by PyTorch Lightning.

First, the data loader from NVTabular outputs a dictionary containing the batch of inputs. In this example, we are handling only categorical values, but this transform step can handle continuous values as well. The output is a tuple of categoricals and continuous variables, plus the label.

Secondly, we define the training and evaluation steps that use the transform function above.

I’m omitting the forward step since it is simply a matter of inputting the categorical and continuous variables to the correct layers, concatenating the wide and deep components of the model, and adding a sigmoid head. The output of the model is the probability of how much the user will consume the item.

Lastly, with everything defined properly, it’s time to stitch it all together. Each of the functions here (create_loaders, create_model, and create_trainer) is user-defined. As the name suggests, it simply creates these objects for training. The create_loaders function creates the data loaders. The create_model function creates the model and the Optuna hyperparameter search space. The create_trainer contains the CometML logger and the trainer initialization.

Evaluation and Conclusions

I’ve launched only 6 trials for this one. Decent, but more trials can yield better results.

As you can probably guess, there are a lot of components that had to be stitched together. Building this could be a pain especially when there are multiple projects that probably require the same thing. I recommend creating an in-house framework to distribute this template sustainably.

The next steps could be:

Deploy an inference service through TorchServe.
Create a training-deployment pipeline using Kedro.
Extract top-n user recommendations and store them in a cache server.

Thanks for reading!

References

[1] F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19.

[2] Cheng, Heng-Tze, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, et al. “Wide & Deep Learning for Recommender Systems.” Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016. https://doi.org/10.1145/2988450.2988454.

Originally published at http://itstherealdyl.com on June 17, 2022.