The world’s leading publication for data science, AI, and ML professionals.

Logging TensorFlow(Keras) metrics to Azure ML Studio in realtime

Training a TensorFlow/Keras model on Azure's Machine Learning Studio can save a lot of time

A real-time approach using a custom Keras callback.

Image by author
Image by author

Training a TensorFlow/Keras model on Azure‘s Machine Learning Studio can save a lot of time, especially if you don’t have your own GPU or your dataset is large. It seems that there should be an easy way to track your training metrics in Azure ML Studio’s dashboard. Well, there is! It just requires a short custom Keras callback.

If you are new to training TensorFlow models on Azure, take a look my article "Train on Cloud GPUs with Azure Machine Learning SDK for Python." It starts from the beginning and implements an entire training workflow from scratch. This post, however, assumes you know the basics and will only focus on the necessary tools to log your metrics to Azure.

There is a working code example that demonstrates the tools in this article in the examplesfolder of the GitHub repository for this project. The callback itself is in the log_to_azure.py file.

benbogart/azure-tensorflow-callback

The azureml.core.Run object

Before you look at the callback, you will need an azureml.core.Run object to tell your callback where to log the metrics. Getting the Run object from within your azure training script is quite simple. The following code does the trick.

from azureml.core import Run
run = Run.get_context()

This will only work when the model is running on azure. If you run your script locally you’ll get an error.

The Callback

All that is left is to implement a simple Keras callback that logs our metrics to Azure ML Studio at the completion of each training epoch. This approach is almost identical to using the TensorBoard callback to store log files for TensorBoard with one exception: you need to pass in an azureml.core.Run object which tells our class where to send the logs. Fortunately you have that from the section above!

I wrote the callback for you 😁 . Here it is.

Implementation

You implement the callback like any other Keras callback.

  1. First download the log_to_azure.py file to your training_script directory.
  2. Import theLogToAzure callback class.
  3. Add the LogToAzure callback to your training script.
from log_to_azure import LogToAzure
...
# add LogToAure custom Callback
callbacks = [LogToAzure(run)]
# fit model and store history
model.fit(train_generator, 
          validation_data=val_generator,
          callbacks=callbacks,
          epochs=10)

Viewing the metrics in Azure ML Studio

Now when you run your models your metrics will be logged to your run in Azure ML Studio. The last step is to setup your "charts" in the Experiments Dashboard so you can visualize the metrics.

The metrics you are sending to Azure will not be available to use in a chart until they have been logged at least once, so you must wait until the first epoch has completed before you do the following.

  • Go to your experiment page. We’ve called our experiment recycling.
Image by author
Image by author
  • You can start by editing the charts that are already on the Experiment Dashboard. Click on the icon of a pencil in the top right of the chart.
  • You, of course, can modify the chart in any way that is useful to you. But I want to see training and validation accuracy on the same chart so I will select those metrics for the y axis as seen below. You can leave iterations on the X axis. If you want to, you can give the chart a clever name.
Image by author
Image by author
  • Once you have edited the existing charts you can add a new chart with the Add chart icon at the top of the page.
Image by author
Image by author

Lastly, do not forget to save the view. If you do not save the view you will have to set up your charts again next time you view the page. To save to the default view click Save view.

Conclusion

That’s it. Now you can track and compare the training progress of your models on azure in real time. And it looks something like this:

If you need help getting started training your models on Azure check out my other article: "Train on Cloud GPUs with Azure Machine Learning SDK for Python."

Now go do some good.


Related Articles