A step-by-step guide to setup MLflow with a Postgres DB for storing metadata and a systemd unit to keep it running.

This is the first article in my MLflow tutorial series:
- Setup MLflow in Production (you are here!)
- MLflow: Basic logging functions
- MLflow logging for TensorFlow
- MLflow Projects
- Retrieving the best model using Python API for MLflow
- Serving a model using MLflow
MLflow is an open-source platform for Machine Learning lifecycle management. Recently, I set up MLflow in production with a Postgres database as a Tracking Server and SFTP for the transfer of artifacts over the network. It took me about 2 weeks to get all the components right but this post would help you setup of MLflow in a production environment in about 10 minutes.
Requirements
Tracking Server Setup (Remote server)
Tracking Server stores the metadata that you see in the Mlflow UI. First, let’s create a new Conda environment:
conda create -n mlflow_env
conda activate mlflow_env
Install the MLflow and PySFTP libraries:
conda install python
pip install mlflow
pip install pysftp
Our Tracking Server uses a Postgres database as a backend for storing the metadata. So let’s install PostgreSQL:
apt-get install postgresql postgresql-contrib postgresql-server-dev-all
Next, we will create the admin user and a database for the Tracking Server
sudo -u postgres psql
In the psql console:
CREATE DATABASE mlflow_db;
CREATE USER mlflow_user WITH ENCRYPTED PASSWORD 'mlflow';
GRANT ALL PRIVILEGES ON DATABASE mlflow_db TO mlflow_user;
As we’ll need to interact with Postgres from Python, it is needed to install the psycopg2 library. However, to ensure a successful installation we need to install the GCC Linux package before:
sudo apt install gcc
pip install psycopg2-binary
If you would like to connect to the PostgreSQL Server remotely or would like to give its access to the users. You can
cd /var/lib/pgsql/data
Then add the following line at the end of the postgresql.conf file.
listen_addresses = '*'
You can then specify a remote IP from which you want to allow connection to the PostgreSQL Server, by adding the following line at the end of the pg_hba.conf file
host all all 10.10.10.187/32 trust
where 10.10.10.187/32 is the remote IP. To allow connection from any IP, use 0.0.0.0/0 instead. Then restart the PostgreSQL Server to apply the changes.
service postgresql restart
The next step is creating a directory for our Tracking Server to log the Machine Learning models and other artifacts. Remember that the Postgres database is only used for storing metadata regarding those models. This directory is called artifact URI.
mkdir ~/mlflow/mlruns
Create a logging directory.
mkdir ~/mlflow/mllogs
You can run the Tracking Server with the following command. But as soon as you do Ctrl-C or exit the terminal the server stops.
mlflow server --backend-store-uri postgresql://mlflow_user:mlflow@localhost/mlflow_db --default-artifact-root sftp://mlflow_user@<hostname_of_server>:~/mlflow/mlruns -h 0.0.0.0 -p 8000
If you want the Tracking server to be up and running after restarts and be resilient to failures, it is very useful to run it as a systemd service.
You need to go into the /etc/systemd/system directory and create a new file called mlflow-tracking.service with the following content:
[Unit]
Description=MLflow Tracking Server
After=network.target
[Service]
Restart=on-failure
RestartSec=30
StandardOutput=file:/path_to_your_logging_folder/stdout.log
StandardError=file:/path_to_your_logging_folder/stderr.log
User=root
ExecStart=/bin/bash -c 'PATH=/path_to_your_conda_installation/envs/mlflow_env/bin/:$PATH exec mlflow server --backend-store-uri postgresql://mlflow_user:mlflow@localhost/mlflow_db --default-artifact-root sftp://mlflow_user@<hostname_of_server>:~/mlflow/mlruns -h 0.0.0.0 -p 8000'
[Install]
WantedBy=multi-user.target
Activate and enable the above service with the following commands:
sudo systemctl daemon-reload
sudo systemctl enable mlflow-tracking
sudo systemctl start mlflow-tracking
Check that everything worked as expected with the following command:
sudo systemctl status mlflow-tracking
You should see an output similar to this:

Create user for the server named _mlflowuser and make mlflow directory as the working directory for this user. Then create an ssh-key pair in the .ssh directory for the _mlflowuser (/mlflow/.ssh in our case). Put the public key in the authorized_keys file and share the private key with the users.
Additionally, for the MLflow UI to be able to read the artifacts, copy the private key to /root/.ssh/ as well.
Next, we need to create the Host Key for the server manually using this command:
cd /root/.ssh
ssh-keyscan -H <hostname_of_server> >> known_hosts
You can now restart the machine and the MLflow Tracking Server will be up and running after this restart.
On the client machines (local)
In order to start tracking everything under the production Tracking Server, it is necessary to set the following environment variable in your .bashrc.
export MLFLOW_TRACKING_URI='http://<hostname_of_server>:8000'
Do not forget to source your .bashrc file!
. ~/.bashrc
Make sure you install pip packages for mlflow and pysftp in your environment (pysftp is required to facilitate the transfer of artifacts to the production server).
pip install mlflow
pip install pysftp
To be able to authenticate the pysftp transfers, put the private key generated on the Production Server in the .ssh directory of your local machine . Then do
ssh <hostname_of_server>
When prompted to save as a known host, answer yes.
You can access MLflow UI at http://:8000

Run a sample machine learning model from the internet to check whether MLflow can track the runs.
mlflow run [email protected]:databricks/mlflow-example.git -P alpha=0.5
In the next post, I’ll speak about basic MLflow logging functions
References:
[1] MLflow, Installing MLflow (2019), MLflow Documentation