
Named inputs and outputs are essentially dictionaries with string keys and tensor values.
Benefits
- Defence Against Feature Reordering
- Self – Sufficient Model Serving Signatures and Metadata
- Renaming and Absent Feature Protection
Most machine learning pipelines read data from a structured source ( database, CSV files/ Pandas Dataframes , TF Records), perform feature selection, cleaning, (and possibly) preprocessing, passing a raw multidimensional array (tensor) to a model along with another tensor representing the correct prediction for each input sample.
Reorder or rename input features in production? → Useless results or the client – side breaks in production
Absent Features? Missing Data? Bad output value interpretation? Mixing up integer indices by mistake? → Useless Results or the client – side breaks in production
Want to know what feature columns were used for training in order to provide the same ones for inference? → You can’t – Misinterpretation Errors
Want to know what value output values represent? → You can’t – Misinterpretation Errors
Don’t drop column names on the model input layers.
The tf.data.Dataset
already allows you to do that by default, by treating the input as a dictionary.
Over the years the above problems have got easier to deal with. Here’s a small overview of available solutions, with the Tensorflow 2.x
ecosystem.
- TFRecords and tf.Example is hands down the best data format to use with any scale Deep Learning projects. Every feature is named by default.
- Tensorflow Transform uses named inputs and produces named outputs, encouraging you to do the same for your model.
- Keras supports dictionaries of layers as inputs and outputs
- TensorSpec and Serving Signature definitions support named IOs by default.
By using this
serving_raw
signature definition, you can call a Tensorflow Serving Endpoint directly by a JSON payload, without serialising totf.Example
.
Check out the metadata
signature on TF Serving, with a sample bitcoin prediction mode I am currently working on:
Lastly, if you are using TFX or got a protocol buffer schema for the inputs, you should use that to send over data for inference, as it is much more efficient and the errors appear in the client – side sooner, instead of the server – side. Even on this case, keep using named inputs and outputs for your model.
Thanks for reading all the way to the end!
Want to also learn how to structure your next Machine Learning project properly?
- Check out my [Structuring ML Pipeline Projects article](http://Want to create).