Step-by-Step tutorial to build a minimal CI/CD pipeline for your Python project using Travis-CI
Automatically build, test, and publish your Python package with Travis-CI, Codecov, and Pypi.
Building CI/CD pipelines are great if you want to save time on testing your python code in multiple environments before publishing/deploying your packages automatically. It is also a way to see bugs early and to ensure some consistency and reproducibility to your development process.
I recently worked on a project that implements a relatively recent approach to applying a deep learning model to structured data, details of the approach can be found here: Training Better Deep Learning Models for Structured Data using Semi-supervised Learning. I wanted to set up a CI/CD pipeline to do the following:
- Automatic testing of the code at every merge request.
- Computing and displaying the test coverage for the master branch.
- Automatic deployment of the python package/wheel to PyPi if a build on the staging branch passes the tests.
To do that I used Github, Travis-CI, and Codecov, both of which are free for open-source projects.
Steps:
1) Logins
The first step is to log in to Travis-CI with your Github account, then go to settings and activate the repository that you want to work on:
And then do the same with Codecov:
And finally to PyPI, from which you need to generate an access token by going to account setting:
2) Adding the PyPI token to Travis-CI:
To automate publishing the package, you need to add the PyPI token to Travis-CI as an environment variable. In settings:
3) Code
The code needs to have a setup.py file as well as a requirements.txt (if needed). For example, my code relies on multiple libraries like Tensorflow or Pandas, so I need a requirement file like this:
pandas==1.0.4
numpy==1.17.3
scipy==1.4.1
matplotlib==3.1.1
tensorflow_gpu==2.0.1
tqdm==4.36.1
scikit_learn==0.23.2
tensorflow==2.3.0
You’ll also need to implement some tests and put them in a tests/ folder. An example of a test in my code is to run a small training on a synthetic training set and check that the network learned by running an evaluation on the test set:
from deeptabular.deeptabular import (
DeepTabularClassifier,
)
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score
def test_build_classifier():
classifier = DeepTabularClassifier(
cat_cols=["C1", "C2"], num_cols=["N1", "N2"], n_targets=1, num_layers=1
)
df = pd.DataFrame(
{
"C1": np.random.randint(0, 10, size=5000),
"C2": np.random.randint(0, 10, size=5000),
"N1": np.random.uniform(-1, 1, size=5000),
"N2": np.random.uniform(-1, 1, size=5000),
"target": np.random.uniform(-1, 1, size=5000),
}
)
df["target"] = df.apply(
lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
)
test = pd.DataFrame(
{
"C1": np.random.randint(0, 10, size=5000),
"C2": np.random.randint(0, 10, size=5000),
"N1": np.random.uniform(-1, 1, size=5000),
"N2": np.random.uniform(-1, 1, size=5000),
"target": np.random.uniform(-1, 1, size=5000),
}
)
test["target"] = test.apply(
lambda x: 1 if (x["C1"] == 4 and x["N1"] < 0.5) else 0, axis=1
)
classifier.fit(df, target_col="target", epochs=100, save_path=None)
pred = classifier.predict(test)
acc = accuracy_score(test["target"], pred)
assert isinstance(classifier.model, tf.keras.models.Model)
assert acc > 0.9
4) The pipeline
The pipeline used in Travis-CI is written as a YAML file. For example, the one used in the deeptabular repository is:
language: python
python:
- "3.6"
- "3.7"
install:
- pip install -r requirements.txt
- pip install codecov
- pip install pytest-cov
- pip install .
script:
- pytest --cov-report=xml --cov=deeptabular tests/
after_success:
- codecov
deploy:
provider: pypi
user: __token__
password: $TEST_PYPI_TOKEN
distributions: "sdist bdist_wheel"
skip_existing: true
on:
branch: staging
First choose the python version to use:
python:
- "3.6"
- "3.7"
Then install the requirements of the library plus the library itself, pytest and Codecov used for testing:
install:
- pip install -r requirements.txt
- pip install codecov
- pip install pytest-cov
- pip install .
Run the tests and write the test coverage results as an XML:
script:
- pytest --cov-report=xml --cov=deeptabular tests/
Push the coverage report to codecov:
after_success:
- codecov
Finally, publish the package as zip and wheel to PyPI:
deploy:
provider: pypi
user: __token__
password: $TEST_PYPI_TOKEN
distributions: "sdist bdist_wheel"
skip_existing: true
on:
branch: staging
The package is then pushed to PyPI :
And the test coverage results are available in Codecov:
Conclusion:
That's it, this pipeline runs the tests each time there is code pushed to any branch, and publishes the package to PyPI if the staging branch is changed.
References:
- https://dev.to/oscarmcm/distributing-pypi-packages-using-api-tokens-in-travisci-1n9i
- https://docs.travis-ci.com/user/languages/python/