The world’s leading publication for data science, AI, and ML professionals.

Python Poetry – The Best Data Science Dependency Management Tool?

Poetry makes deploying machine learning applications a breeze – learn how!

Photo by Prachi Gautam on Unsplash
Photo by Prachi Gautam on Unsplash

If I had a dollar every time I faced a missing Python dependency or a version mismatch, well, I wouldn’t be a millionaire, but you get the point.

Dependency management is a common problem in Data Science with many potential solutions. A virtual environment is always recommended, but that’s only the beginning. It’s usually followed by keeping track of installed packages. But what about their dependencies? And dependencies of their dependencies? It’s a recursive nightmare.

Poetry might be the solution you’re looking for. It aims to be a one-stop-shop for everything dependency management related, and can even be used for publishing Python packages.

Today, you’ll build a simple Machine Learning application locally and then push it to a remote compute instance. If Poetry keeps up to its promise, the remote setup should be as simple as running a single shell command.


How to Get Started with Poetry

A small inconvenience with Poetry is that you can’t start with a pip install command. It needs an additional command line tool.

Install Poetry

And that command line tool is called pipx. The installation instructions for Mac are pretty straightforward (follow the above link for other operating systems):

brew install pipx
pipx ensurepath
sudo pipx ensurepath --global

Once pipx is installed, install Poetry with the following command:

pipx install poetry

Just a slight inconvenience, but it’s over now.

Create a new Poetry Project

The poetry new <name> command initializes a new Python project and creates an appropriately named folder:

poetry new ml-demo
Image 1 - New Poetry project output (image by author)
Image 1 – New Poetry project output (image by author)

These are the files that’ll get created:

Image 2 - Directory structure (image by author)
Image 2 – Directory structure (image by author)

The file to focus your attention now is pyproject.toml. It contains details of your app, external dependencies, and the minimum required Python version.

If you don’t want to publish the package, which you don’t want in this case, add the following line to the first block:

[tool.poetry]
package-mode = false
Image 3 - The pyproject.toml file (image by author)
Image 3 – The pyproject.toml file (image by author)

You can also remove the tests and ml_demo folders if you wish.

We’ll return to this file after installing a couple of dependencies.

Python Poetry in Action – A Sample Machine Learning Application

Your machine learning app will be a simple decision tree model trained on the Iris dataset and made available through a multi-worker FastAPI service.

Any dependency you need is not installed with pip install, but rather with poetry add command:

poetry add numpy pandas scikit-learn fastapi gunicorn
Image 4 - Installing Python packages (image by author)
Image 4 – Installing Python packages (image by author)

This creates a virtual environment if it doesn’t exist, installs the packages and create/update the poetry.lock file:

Image 5 - The poetry.lock file (image by author)
Image 5 – The poetry.lock file (image by author)

In plain English, the lock file handles the recursive nightmare of dependency management.

The pyproject.toml file was also updated, but it shows only the dependencies you’ve explicitly installed:

Image 6— The pyproject.toml file (2) (image by author)
Image 6— The pyproject.toml file (2) (image by author)

So far, so good!

Visual Studio Code Setup

You’ll face an annoying issue when you start writing Python code in an editor such as VSCode – the editor doesn’t know where to find the newly created virtual environment!

And neither do you.

The poetry show -v command lists the path to the virtual environment and shows dependencies it has installed:

poetry show -v
Image 7 - Listing virtual environment details (image by author)
Image 7 – Listing virtual environment details (image by author)

If you want only the environment’s absolute path, opt for this command instead:

poetry env info -p
Image 8 - Printing absolute path to the Python environment (image by author)
Image 8 – Printing absolute path to the Python environment (image by author)

Now in Visual Studio Code (or any other editor), simply change the path to your Python interpreter. Copy the path returned by the above command, and add /bin/python3.<version> to the end. I’m using Python 3.12, so the part to append becomes /bin/python3.12:

/<path-to-environment>/bin/python3.<version>
Image 9 - Setting up VSCode interpreter (image by author)
Image 9 – Setting up VSCode interpreter (image by author)

The import errors will now disappear.

Machine Learning Application

Onto the app now. Create a utils/ml.py file and copy the following code:

import os
import pickle
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

MODEL_PATH = "ml_models/iris.model"

def train_model() -> DecisionTreeClassifier:
    iris = load_iris()
    X = iris.data
    y = iris.target
    model = DecisionTreeClassifier()
    model.fit(X, y)
    return model

def save_model(model: DecisionTreeClassifier) -> bool:
    os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True)
    try:
        with open(MODEL_PATH, "wb") as f:
            pickle.dump(model, f)
        return True
    except Exception as e:
        print(f"An error occurred while saving the model: {e}")
        return False

def load_model() -> DecisionTreeClassifier:
    try:
        with open(MODEL_PATH, 'rb') as f:
            model = pickle.load(f)
        return model
    except Exception as e:
        print(f"An error occurred while loading the model: {e}")
        raise e

def predict(
        model: DecisionTreeClassifier, 
        sepal_length: float, 
        sepal_width: float, 
        petal_length: float, 
        petal_width: float
    ) -> dict:
    prediction = model.predict([[sepal_length, sepal_width, petal_length, petal_width]])
    prediction_str = ""
    match prediction:
        case 0:
            prediction_str = "setosa"
        case 1:
            prediction_str = "versicolor"
        case 2:
            prediction_str = "virginica"

    prediction_prob = model.predict_proba([[sepal_length, sepal_width, petal_length, petal_width]])
    return {
        "prediction": prediction_str,
        "prediction_probabilities": prediction_prob[0].tolist()
    }

It’s a simple file with a couple of functions to train the model, make a single prediction, and read/write the model file to/from disk.

Also, create main.py file to test the functionality:

from utils.ml import train_model, save_model, load_model, predict

if __name__ == "__main__":
    model = train_model()
    did_save = save_model(model=model)
    if did_save:
        print("Model saved")
    loaded_model = load_model()
    pred = predict(
        model=loaded_model,
        sepal_length=4.2,
        sepal_width=3.1,
        petal_length=3.4,
        petal_width=4.1
    )
    print(pred)

To use the virtual environment created by Poetry, you must prepend Python script run commands with poetry run:

poetry run python main.py
Image 10 - Model prediction result (image by author)
Image 10 – Model prediction result (image by author)

You get the prediction class and probabilities back, so it’s safe to assume everything works as advertised.

FastAPI Application

Finally, replace the contents of main.py will the following code snippet:

import os
from fastapi import FastAPI
from pydantic import BaseModel
from utils.ml import train_model, save_model, load_model, predict

app = FastAPI()
model = None

# Request body for prediction
class IrisFeatures(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

# Ensure the model is loaded on startup
@app.on_event("startup")
def startup_event():
    if not os.path.exists("ml_models/iris.model"):
        _model = train_model()
        did_save_model = save_model(model=_model)
        if did_save_model:
            print("Model trained and saved successfully.")
        else:
            print("Model training and saving failed.")
    global model
    model = load_model()

# The prediction endpoint
@app.post("/predict")
def make_prediction(iris: IrisFeatures):
    return predict(
        model=model,
        sepal_length=iris.sepal_length,
        sepal_width=iris.sepal_width,
        petal_length=iris.petal_length,
        petal_width=iris.petal_width
    )

It will create or load the model when the application starts (depending if the model already exists), and expose the prediction functionality on the /predict endpoint.

For the final bit of testing, run the FastAPI application through Gunicorn and optionally increase the number of workers. It doesn’t make any difference locally, but changing this parameter will allow more users to access your API at the same time:

poetry run gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8500
Image 11 - FastAPI startup (image by author)
Image 11 – FastAPI startup (image by author)

Seems like the app is running, so let’s test it:

Image 12 - Model prediction result (2) (image by author)
Image 12 – Model prediction result (2) (image by author)

The correct output is returned! Let’s now transfer the app to a new environment to see what happens.

Test on a New Environment – EC2 Instance

I’ve provisioned a free tier EC2 instance (Ubuntu) on AWS for this section of the article.

Assuming you’ve done the same, and assuming yours also comes with Python 3.12 (specified in pyproject.toml) or later, run the following set of commands to update the system and install Poetry:

sudo apt update &amp;&amp; sudo apt upgrade -y
sudo apt install gunicorn -y

sudo apt install pipx -y
pipx ensurepath
sudo pipx ensurepath --global

pipx install poetry
Image 13 - Linux machine setup (image by author)
Image 13 – Linux machine setup (image by author)

Visual Studio Code has a neat Remote SSH plugin that allows you to connect to remote instances. But really, any SCP tool will do the trick.

I’ve copied our ml-demo folder with Python code and Poetry environment details:

Image 14 - Directory structure (image by author)
Image 14 – Directory structure (image by author)

Poetry promises to make dependency management in new environments a breeze. All you have to do is navigate to the application folder and run the poetry install command:

cd ml-demo
poetry install
Image 15 - Package installation (image by author)
Image 15 – Package installation (image by author)

It seems like a new virtual environment was created and dependencies were installed.

The ultimate test will be running the same FastAPI startup command you ran moments ago in a local environment. If no errors are raised, Poetry has delivered on its promise:

poetry run gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8500
Image 16 - FastAPI startup on a remote instance (image by author)
Image 16 – FastAPI startup on a remote instance (image by author)

The app looks to be running, which is great news! Now, assuming you’ve allowed traffic to port 8500, you will be able to make requests to the /predict endpoint:

Image 17 - Model prediction result (3) (image by author)
Image 17 – Model prediction result (3) (image by author)

You should get the same response back as with the local testing. If the request hangs for a while and then dies, your instance likely doesn’t allow traffic on the port the app is running on.

Anyhow, the deployment process sure was a breeze.


Summing up Python Poetry

Tools like Poetry promise to reduce the number of migraines you get during model deployment. They also allow you to focus on what’s important, such as improving model quality and enriching its response, and not waste time on errors that shouldn’t happen in the first place.

For the short time I’ve been using Poetry on my day job, I’ve yet to find an error in which Poetry is to blame. Don’t get me wrong, I’ve seen errors after running the poetry install command, but they’ve all been OS-related. Like forgetting to install Gunicorn or something even dumber.

Have you used Poetry professionally? Did you encounter any issues and/or limitations? Make sure to let me know in the comment section below.

Read next:

Python Concurrency – A Brain-Friendly Guide for Data Professionals


Related Articles