How to build an MLOps pipeline for hyperparameter tuning in Vertex AI

Best practices to set up your model and orchestrator for hyperparameter tuning

Published in

Towards Data Science

6 min readNov 13, 2021

When you design a machine learning model, there are a number of hyperparameters — learning rate, batch size, number of layers/nodes in the neural network, number of buckets, number of embedding dimensions, etc. that you essentially guess. There is usually a 2-10% improvement to be had over the initial guess by finding optimal values of these hyperparameters. (Of course, this depends on how bad your initial guess was, but I’ll assume you made somewhat educated guesses).

In an earlier article, I suggested that you use Jupyter notebooks for experimentation, but move to a normal Python file once things stabilize. Doing so separates out the responsibilities between ML development and MLOps. So, let’s say you have done that and your model training file exists as model.py and your pipeline orchestrator as train_on_vertexai.py. You might find it helpful to use the code in the two links to follow along as you read the article.

1. Parameterize your model in model.py

The first step is to make the hyperparameters command-line parameters to your model. For example, in model.py, we might do:

parser.add_argument(
        '--nembeds',
        help='Embedding dimension for categorical variables',
        type=int,
        default=3
    )

Note that the initial guess for the variable is the default value. This allows your training script to continue working as it did before. Then, you set the variable from the command-line parameters for use by the training script:

args = parser.parse_args()
...
NEMBEDS = args['nembeds']

Do this for all the hyperparameters you might ever want to tune now, or in the future. A good practice is to never have any hardcoded values in model.py — everything there needs to be an input parameter.

2. Implement a shorter training run

Typically, your training run will involve training on the full dataset and then evaluating on the full test dataset. Doing a complete training run for hyperparameter tuning is expensive, wasteful, and wrong. Why?

Expensive: The point of hyperparameter tuning is to obtain the best set of parameters, not to obtain the best possible model. Once you find the best of parameters, you will then train a model with those parameters to completion. Therefore, there is no need to carry out a trial to completion. You just need to train it until you know which trial is likely to end up better.

Doing a complete training run for hyperparameter tuning is expensive, wasteful, and wrong. Diagram by author.

Wasteful: Under the assumption that your training curves are well-behaved, a better set of parameters will be better throughout the training process, and you can stop the training well before it starts to converge. Use your training budget to do more trials, not to run those trials longer.

Wrong: You don’t want to evaluate the hyperparameter tuning on the test dataset. You want to compare performance on the validation dataset. Just make sure that the validation dataset is large enough for you to do this comparison between trial models meaningfully.

The way I do these modifications is to add two options to my model.py: one to train for a shorter time and another to skip the full evaluation:

NUM_EXAMPLES = args['num_examples']
SKIP_FULL_EVAL = args['skip_full_eval']
...steps_per_epoch = NUM_EXAMPLES // train_batch_size
epochs = NUM_EPOCHS
eval_dataset = read_dataset(eval_data_pattern, eval_batch_size,
                    tf.estimator.ModeKeys.EVAL, num_eval_examples)
model.fit(train_dataset,
                        validation_data=eval_dataset,
                        epochs=NUM_EPOCHS,
                        steps_per_epoch=steps_per_epoch,
                        callbacks=[cp_callback, HpCallback()])
...if not SKIP_FULL_EVAL:
        test_dataset = read_dataset(test_data_pattern, eval_batch_size, tf.estimator.ModeKeys.TEST, None)
        final_metrics = model.evaluate(test_dataset)
        ...
else:
        logging.info("Skipping evaluation on full test dataset")

What’s the deal with steps_per_epoch and NUM_EXAMPLES? Note the x-axis in the graph above. It’s not epochs — it’s the number of examples. While it’s pretty wasteful to train on the full dataset, it can be helpful to get the same number of intermediate metrics that you would get with the full amount of training (I’ll explain why in the next step). Because you will also be hyperparameter tuning the batch size, the best way to do this is to use Virtual Epochs (see the Machine Learning Design Patterns book: Checkpoints for details). Steps-per-epoch is how we get virtual epochs on large datasets.

3. Write out metrics during training

Write out metrics during the training process. Don’t just wait until the end. If you do this, then Vertex AI will also help you save you costs by cutting short unproductive trials.

In Keras, to write out metrics during training you can use a callback. This is what that looks like:

METRIC = 'val_rmse'
hpt = hypertune.HyperTune()class HpCallback(tf.keras.callbacks.Callback):
        def on_epoch_end(self, epoch, logs=None):
            if logs and METRIC in logs:
                logging.info("Epoch {}: {} = {}".format(epoch, METRIC, logs[METRIC]))
                hpt.report_hyperparameter_tuning_metric(
hyperparameter_metric_tag=METRIC, metric_value=logs[METRIC], global_step=epoch)...history = model.fit(train_dataset,
                    ...
                    callbacks=[cp_callback, HpCallback()])

I’m using the cloudml-hypertune package to simplify the writing of metrics in a form that the TensorFlow ecosystem (TensorBoard, Vizier, etc.) can understand.

4. Implement a hyperparameter tuning pipeline

Now that you have modified model.py to make it easy to do hyperparameter tuning, the MLOps people can tune your model anytime they notice it drifting.

There are two steps in the orchestration code (in train_on_vertexai.py):

(4a) Create a Vertex AI CustomJob to call your model.py with the right parameters:

    tf_version = '2-' + tf.__version__[2:3]
    train_image = "us-docker.pkg.dev/vertex-ai/training/tf-gpu.{}:latest".format(tf_version)    model_display_name = '{}-{}'.format(ENDPOINT_NAME, timestamp)
    trial_job = aiplatform.CustomJob.from_local_script(
        display_name='train-{}'.format(model_display_name),
        script_path="model.py",
        container_uri=train_image,
        args=[
            '--bucket', BUCKET,
            '--skip_full_eval', # no need to evaluate on test data
            '--num_epochs', '10',
            '--num_examples', '500000' # 1/10 actual size
        ],
        requirements=['cloudml-hypertune'],  # hpt
        replica_count=1,
        machine_type='n1-standard-4',
        # See https://cloud.google.com/vertex-ai/docs/general/locations#accelerators
        accelerator_type=aip.AcceleratorType.NVIDIA_TESLA_T4.name,
        accelerator_count=1,
    )

(4b) Create and run a hyperparameter tuning job that will use the above job as an individual trial:

   hparam_job = aiplatform.HyperparameterTuningJob(
        # See https://googleapis.dev/python/aiplatform/latest/aiplatform.html#
        display_name='hparam-{}'.format(model_display_name),
        custom_job=trial_job,
        metric_spec={'val_rmse': 'minimize'},
        parameter_spec={
            "train_batch_size": hpt.IntegerParameterSpec(min=16, max=256, scale='log'),
            "nbuckets": hpt.IntegerParameterSpec(min=5, max=10, scale='linear'),
            "dnn_hidden_units": hpt.CategoricalParameterSpec(values=["64,16", "64,16,4", "64,64,64,8", "256,64,16"])
        },
        max_trial_count=4 if develop_mode else 10,
        parallel_trial_count=2,
        search_algorithm=None,  # Bayesian
    )

Note that I am specifying the metric here to match the METRIC in my model.py and that I’m specifying ranges for the parameters.

By default, the hyperparameter tuning service in Vertex AI (called Vizier) will use Bayesian Optimization, but you can change the algorithm to GridSearch if you want.

5. Monitor the GCP web console

Once you launch the hyperparameter tuning job, you can look at the Vertex AI section of the GCP console to see the parameters come in.

You’ll see something like this:

6. Running best trial to completion

Once you determine the best set of parameters, take the best set of parameters and then run the training job to completion. That will give you the model to deploy.

We can automate this as well, of course:

    best = sorted(hparam_job.trials, 
              key=lambda x: x.final_measurement.metrics[0].value)[0]
    logging.info('Best trial: {}'.format(best))
    best_params = []
    for param in best.parameters:
        best_params.append('--{}'.format(param.parameter_id))
        best_params.append(param.value)    # run the best trial to completion
    model = train_custom_model(data_set, timestamp, develop_mode, extra_args=best_params)

Again, separation of responsibilities between the data scientist (deciding what the parameters are, that can be optimized) and the MLOps engineer (deciding when to retune the model) is made easier through Vertex AI.

Enjoy!