The world’s leading publication for data science, AI, and ML professionals.

Tensorflow Best Practises: Named Inputs and Outputs

Quit depending on positional indices and input value ordering. Start relying on named inputs and outputs. Avoiding data wiring errors.

Image by Daniel Dino-Slofer from Pixabay
Image by Daniel Dino-Slofer from Pixabay

Named inputs and outputs are essentially dictionaries with string keys and tensor values.

Benefits

  1. Defence Against Feature Reordering
  2. Self – Sufficient Model Serving Signatures and Metadata
  3. Renaming and Absent Feature Protection

Most machine learning pipelines read data from a structured source ( database, CSV files/ Pandas Dataframes , TF Records), perform feature selection, cleaning, (and possibly) preprocessing, passing a raw multidimensional array (tensor) to a model along with another tensor representing the correct prediction for each input sample.

Reorder or rename input features in production?Useless results or the client – side breaks in production

Absent Features? Missing Data? Bad output value interpretation? Mixing up integer indices by mistake?Useless Results or the client – side breaks in production

Want to know what feature columns were used for training in order to provide the same ones for inference?You can’t – Misinterpretation Errors

Want to know what value output values represent?You can’t – Misinterpretation Errors


Don’t drop column names on the model input layers.

The tf.data.Dataset already allows you to do that by default, by treating the input as a dictionary.

Over the years the above problems have got easier to deal with. Here’s a small overview of available solutions, with the Tensorflow 2.x ecosystem.

Check out the metadata signature on TF Serving, with a sample bitcoin prediction mode I am currently working on:

Lastly, if you are using TFX or got a protocol buffer schema for the inputs, you should use that to send over data for inference, as it is much more efficient and the errors appear in the client – side sooner, instead of the server – side. Even on this case, keep using named inputs and outputs for your model.


Thanks for reading all the way to the end!


Want to also learn how to structure your next Machine Learning project properly?

  • Check out my [Structuring ML Pipeline Projects article](http://Want to create).

Related Articles