Step-by-Step tutorial to build a minimal CI/CD pipeline for your Python project using Travis-CI

Automatically build, test, and publish your Python package with Travis-CI, Codecov, and Pypi.

Published in

Towards Data Science

4 min readOct 8, 2020

Building CI/CD pipelines are great if you want to save time on testing your python code in multiple environments before publishing/deploying your packages automatically. It is also a way to see bugs early and to ensure some consistency and reproducibility to your development process.

I recently worked on a project that implements a relatively recent approach to applying a deep learning model to structured data, details of the approach can be found here: Training Better Deep Learning Models for Structured Data using Semi-supervised Learning. I wanted to set up a CI/CD pipeline to do the following:

Automatic testing of the code at every merge request.
Computing and displaying the test coverage for the master branch.
Automatic deployment of the python package/wheel to PyPi if a build on the staging branch passes the tests.

To do that I used Github, Travis-CI, and Codecov, both of which are free for open-source projects.

Steps:

1) Logins

The first step is to log in to Travis-CI with your Github account, then go to settings and activate the repository that you want to work on:

And then do the same with Codecov:

And finally to PyPI, from which you need to generate an access token by going to account setting:

2) Adding the PyPI token to Travis-CI:

To automate publishing the package, you need to add the PyPI token to Travis-CI as an environment variable. In settings:

3) Code

The code needs to have a setup.py file as well as a requirements.txt (if needed). For example, my code relies on multiple libraries like Tensorflow or Pandas, so I need a requirement file like this:

pandas==1.0.4
numpy==1.17.3
scipy==1.4.1
matplotlib==3.1.1
tensorflow_gpu==2.0.1
tqdm==4.36.1
scikit_learn==0.23.2
tensorflow==2.3.0

You’ll also need to implement some tests and put them in a tests/ folder. An example of a test in my code is to run a small training on a synthetic training set and check that the network learned by running an evaluation on the test set:

from deeptabular.deeptabular import (
    DeepTabularClassifier,
)
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score


def test_build_classifier():
    classifier = DeepTabularClassifier(
        cat_cols=["C1", "C2"], num_cols=["N1", "N2"], n_targets=1, num_layers=1
    )
    df = pd.DataFrame(
        {
            "C1": np.random.randint(0, 10, size=5000),
            "C2": np.random.randint(0, 10, size=5000),
            "N1": np.random.uniform(-1, 1, size=5000),
            "N2": np.random.uniform(-1, 1, size=5000),
            "target": np.random.uniform(-1, 1, size=5000),
        }
    )
    df["target"] = df.apply(
        lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
    )

    test = pd.DataFrame(
        {
            "C1": np.random.randint(0, 10, size=5000),
            "C2": np.random.randint(0, 10, size=5000),
            "N1": np.random.uniform(-1, 1, size=5000),
            "N2": np.random.uniform(-1, 1, size=5000),
            "target": np.random.uniform(-1, 1, size=5000),
        }
    )
    test["target"] = test.apply(
        lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
    )

    classifier.fit(df, target_col="target", epochs=100, save_path=None)

    pred = classifier.predict(test)

    acc = accuracy_score(test["target"], pred)

    assert isinstance(classifier.model, tf.keras.models.Model)
    assert acc > 0.9

4) The pipeline

The pipeline used in Travis-CI is written as a YAML file. For example, the one used in the deeptabular repository is:

language: python
python:
  - "3.6"
  - "3.7"
install:
  - pip install -r requirements.txt
  - pip install codecov
  - pip install pytest-cov
  - pip install .
script:
  - pytest --cov-report=xml --cov=deeptabular tests/

after_success:
  - codecov

deploy:
  provider: pypi
  user: __token__
  password: $TEST_PYPI_TOKEN
  distributions: "sdist bdist_wheel"
  skip_existing: true
  on:
    branch: staging

First choose the python version to use:

python:
  - "3.6"
  - "3.7"

Then install the requirements of the library plus the library itself, pytest and Codecov used for testing:

install:
  - pip install -r requirements.txt
  - pip install codecov
  - pip install pytest-cov
  - pip install .

Run the tests and write the test coverage results as an XML:

script:
  - pytest --cov-report=xml --cov=deeptabular tests/

Push the coverage report to codecov:

after_success:
  - codecov

Finally, publish the package as zip and wheel to PyPI:

deploy:
  provider: pypi
  user: __token__
  password: $TEST_PYPI_TOKEN
  distributions: "sdist bdist_wheel"
  skip_existing: true
  on:
    branch: staging

The package is then pushed to PyPI :

And the test coverage results are available in Codecov:

Conclusion:

That's it, this pipeline runs the tests each time there is code pushed to any branch, and publishes the package to PyPI if the staging branch is changed.

References:

Code:

https://github.com/CVxTz/DeepTabular