Step-by-Step tutorial to build a minimal CI/CD pipeline for your Python project using Travis-CI

Automatically build, test, and publish your Python package with Travis-CI, Codecov, and Pypi.

Youness Mansar
Towards Data Science

--

Photo by Philipp Wüthrich on Unsplash

Building CI/CD pipelines are great if you want to save time on testing your python code in multiple environments before publishing/deploying your packages automatically. It is also a way to see bugs early and to ensure some consistency and reproducibility to your development process.

I recently worked on a project that implements a relatively recent approach to applying a deep learning model to structured data, details of the approach can be found here: Training Better Deep Learning Models for Structured Data using Semi-supervised Learning. I wanted to set up a CI/CD pipeline to do the following:

  • Automatic testing of the code at every merge request.
  • Computing and displaying the test coverage for the master branch.
  • Automatic deployment of the python package/wheel to PyPi if a build on the staging branch passes the tests.

To do that I used Github, Travis-CI, and Codecov, both of which are free for open-source projects.

Steps:

1) Logins

The first step is to log in to Travis-CI with your Github account, then go to settings and activate the repository that you want to work on:

And then do the same with Codecov:

And finally to PyPI, from which you need to generate an access token by going to account setting:

2) Adding the PyPI token to Travis-CI:

To automate publishing the package, you need to add the PyPI token to Travis-CI as an environment variable. In settings:

3) Code

The code needs to have a setup.py file as well as a requirements.txt (if needed). For example, my code relies on multiple libraries like Tensorflow or Pandas, so I need a requirement file like this:

pandas==1.0.4
numpy
==1.17.3
scipy
==1.4.1
matplotlib
==3.1.1
tensorflow_gpu
==2.0.1
tqdm
==4.36.1
scikit_learn
==0.23.2
tensorflow
==2.3.0

You’ll also need to implement some tests and put them in a tests/ folder. An example of a test in my code is to run a small training on a synthetic training set and check that the network learned by running an evaluation on the test set:

from deeptabular.deeptabular import (
DeepTabularClassifier,
)
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score


def test_build_classifier():
classifier = DeepTabularClassifier(
cat_cols=["C1", "C2"], num_cols=["N1", "N2"], n_targets=1, num_layers=1
)
df = pd.DataFrame(
{
"C1": np.random.randint(0, 10, size=5000),
"C2": np.random.randint(0, 10, size=5000),
"N1": np.random.uniform(-1, 1, size=5000),
"N2": np.random.uniform(-1, 1, size=5000),
"target": np.random.uniform(-1, 1, size=5000),
}
)
df["target"] = df.apply(
lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
)

test = pd.DataFrame(
{
"C1": np.random.randint(0, 10, size=5000),
"C2": np.random.randint(0, 10, size=5000),
"N1": np.random.uniform(-1, 1, size=5000),
"N2": np.random.uniform(-1, 1, size=5000),
"target": np.random.uniform(-1, 1, size=5000),
}
)
test["target"] = test.apply(
lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
)

classifier.fit(df, target_col="target", epochs=100, save_path=None)

pred = classifier.predict(test)

acc = accuracy_score(test["target"], pred)

assert isinstance(classifier.model, tf.keras.models.Model)
assert acc > 0.9

4) The pipeline

The pipeline used in Travis-CI is written as a YAML file. For example, the one used in the deeptabular repository is:

language: python
python:
- "3.6"
- "3.7"
install
:
- pip install -r requirements.txt
- pip install codecov
- pip install pytest-cov
- pip install .
script:
- pytest --cov-report=xml --cov=deeptabular tests/

after_success:
- codecov

deploy:
provider: pypi
user: __token__
password: $TEST_PYPI_TOKEN
distributions: "sdist bdist_wheel"
skip_existing
: true
on:
branch: staging

First choose the python version to use:

python:
- "3.6"
- "3.7"

Then install the requirements of the library plus the library itself, pytest and Codecov used for testing:

install:
- pip install -r requirements.txt
- pip install codecov
- pip install pytest-cov
- pip install .

Run the tests and write the test coverage results as an XML:

script:
- pytest --cov-report=xml --cov=deeptabular tests/

Push the coverage report to codecov:

after_success:
- codecov

Finally, publish the package as zip and wheel to PyPI:

deploy:
provider: pypi
user: __token__
password: $TEST_PYPI_TOKEN
distributions: "sdist bdist_wheel"
skip_existing
: true
on:
branch: staging

The package is then pushed to PyPI :

And the test coverage results are available in Codecov:

Conclusion:

That's it, this pipeline runs the tests each time there is code pushed to any branch, and publishes the package to PyPI if the staging branch is changed.

References:

  1. https://dev.to/oscarmcm/distributing-pypi-packages-using-api-tokens-in-travisci-1n9i
  2. https://docs.travis-ci.com/user/languages/python/

Code:

https://github.com/CVxTz/DeepTabular

--

--