Version Control Your ML Model Deployment With Git using Modelbit

Develop, deploy, and track!

Avi Chawla
Towards Data Science

--

Photo by Yancy Min on Unsplash

Introduction

Version control is critical to all development processes, allowing developers to track software changes (code, configurations, data, etc.) over time.

Moreover, it facilitates collaboration between team members, enabling them to work together on the same codebase without interfering with each other’s work.

In the context of data teams, version control can be especially crucial when deploying models.

It enables them to identify precisely what changed, when it changed, and who changed it — crucial information when trying to diagnose and fix issues that arise during the deployment process or if models start underperforming post-deployment.

Model version control (Image by author)

In such cases, git-based functionality can offer quick rollback to previous versions.

Therefore, in this article, I will show how you can power your model deployment with Git functionality.

More specifically, we’ll use the git-functionality of Modelbit for deployment and sync GitHub with Modelbit for collaborative functionalities.

Let’s begin 🚀!

Importance of Git for data teams

Before diving into the how-to stuff, let’s build more motivation for git-based version control and why it’s essential.

#1) Collaboration

Effective collaboration becomes increasingly important as data science projects get bigger and bigger.

With version control, teams can work on the same codebase/data and improve the same models without interfering with each other’s work.

Branching models (Image by author)

Moreover, one can easily track changes, review each other’s work, and resolve conflicts (if any).

#2) Reproducibility

Reproducibility is one of the critical aspects of building reliable machine learning. Something that one works on one system but does not work on another reflects bad reproducibility practices.

Why it’s important, you may wonder?

It ensures that results can be replicated and validated by others, improving the overall credibility of your work.

Reproducibility using version control (Image by author)

Version control enables you to track the exact code version and configurations used to produce a particular result, making it easier to reproduce results in the future.

This can be especially useful for open-source projects that many folks can use.

#3) Continuous Integration and Deployment (CI/CD)

CI/CD enables teams to build, test, and deploy code quickly and efficiently.

In machine learning, Continuous Integration (CI) may involve building and testing changes automatically to ML models as soon as they are committed to a code repository.

In Continuous Deployment (CD), the objective can be to reflect new changes to the model once they have passed testing. Consequently, it should seamlessly update the changes to production, making the latest version of the model available to end users.

Now that we know why version control is important, both from a development perspective and deployment, let’s look at how you can leverage git-based functionalities in deployment phases with Modelbit.

Local Repository and Modelbit integration

Modelbit is entirely driven by git. Thus, whenever you push a model for deployment, it internally maintains the deployment as a git repository.

Git-based deployment (Image by author)

Being supported by git, it natively provides all advantages of version control for your deployments, models, and datasets.

What’s more, you can clone into the remote git repository from your local machine and execute all git commands like git pull, git push, or performing branching, etc.

Connect to Modelbit git repo

To access the Modelbit git repository, you should add an SSH key that will connect your local machine to Modelbit.

Open the terminal and run the following command:

ssh-keygen -t rsa -b 4096 -C "My SSH key"

This will create an SSH key. To view it, run the following command:

cat ~/.ssh/id_rsa.pub

The above commands were taken from the Official GitHub Docs.

Now, copy the entire output of the cat command and head over to Git settings in the Modelbit dashboard. Click on “Add Key” and paste the output obtained above. This is demonstrated below:

Add SSH Key (Image by author)

And you are done!

Now we are connected to Modelbit’s remote git repository.

Deploy Model

Let’s push a model for deployment from a Jupyter Notebook. I won’t go into detail as I have already covered this in one of my previous blogs.

In a gist, you should train a model, define a prediction function and push this function object for deployment, as shown below:

## Train Model
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(x, y)

## Define Prediction function
def Linear_Model(input_x):

if isinstance(input_x, (int, float)): ## check input type
return model.predict([[input_x]])[0] ## prediction

else:
return None

## Deploy it
import modelbit
mb = modelbit.login() ## authenticate the notebook here.
mb.deploy(Linear_Model)

Once we deploy a model, we see the following in our Modelbit dashboard:

Deployment Dashboard (Image by author)

Clone Modelbit repo

Let’s clone into this repository to see its contents. Run the following command in the terminal.

modelbit clone my_linear_model

This will clone into Moldebit’s git repository and create a folder my_linear_model.

Cloning deployment repo (Image by author)

Once you run the command, copy the link obtained to authenticate.

As demonstrated above, cloning creates a new local repository, with datasets, deployment, and endpoints in the main branch of the remote git repository of Modelbit.

The current repository structure is as follows:

my_linear_models
├── bin
├── datasets
├── endpoints
└── deployments
└── Linear_Model
├── source.py ## source code
└── data
└── model.pkl ## model pickle

Git push to Modelbit

Now that we have cloned into the remote repository, we can make any changes locally and push them.

Let’s add a dummy CSV file to the Linear_Model folder, commit it to the local repo and push it to the remote repo.

my_linear_models
├── bin
├── datasets
├── endpoints
└── deployments
└── Linear_Model
├── source.py ## source code
├── dummy_data.csv ## added locally
└── data
└── model.pkl ## model pickle

Let’s add the CSV file to the staging area:

git add deployments/Linear_Model/git dummy_data.csv

Next, let’s commit it to the local repo:

git commit -m "Add dummy data csv"

Finally, let’s push it:

git push

With this, the dummy CSV file has been committed to the remote Modelbit git repo.

Note: There’s a reason we added the CSV to the Linear_Model folder but not datasets folder. The datasets folder only supports datasets via SQL queries. The results of those queries are then available at runtime for running deployments. Any other custom dataset isn’t supported yet.

Branching

If you wish to create and work in a separate branch in the remote Modelbit repo, that is also possible.

Create a new branch from the dashboard as follows:

Branching remote repo (Image by author)

Next, say we want to improve our model locally in this branch. In your notebook, you can switch to this new branch as follows:

## notebook.ipynb

mb.switch_branch("another_branch")

Now, all new deployments (and other commits, if any) done from the notebook will be pushed to another_branch branch of the remote Modelbit git repo.

Syncing GitHub

The remote Modelbit repo can be automatically synced with your personal GitHub repository.

This is particularly useful for performing GitHub-based code review, CI/CD, and Pull Request workflows on your Modelbit deployments.

#1) Create a new GitHub repo

Below, I have created an empty repo on GitHub.

New GitHub repo (Image by author)

Next, we should give write access to this repository to Modelbit.

#2 Copy the SSH URL of the GitHub repo

Under CodeSSH, copy the URL.

Repo SSH URL (Image by author)

#3) Add Git remote in Modelbit

In the dashboard, go to Git Settings, Add Git Remote and paste the copied repo URL, and Connect Remote.

Add Git Remote to Modelbit (Image by author)

#4) Grant Write access to Modelbit

From the above sync panel, copy the Deploy Key:

Deploy key (Image by author)

Now go to Settings of your GitHub repo, Deploy keys, Add deploy key. Paste the key, give it a title, grant write access and click Add key.

Add deploy key in GitHub repo (Image by author)

And done! The GitHub repository has been automatically updated:

Deployment code in GitHub (Image by author)

Now the remote Modelbit git repository is synced with your GitHub repository, and you can use it for all sorts of collaborative work.

Conclusion

With this, we come to the end of this blog.

In this post, we learned the importance of Git functionalities for data teams and how model deployment can be backed with git using Moldelbit.

Next, we looked at how to connect a remote git repository created internally by Modelbit to a personal GitHub repo.

Having said that, Modelbit is in the early stages of development, and currently, it might not be an ultimate alternative to other services like Heroku.

However, as per my experience with both Modelbit and Heroku, I believe that deployment with Modelbit is streamlined and less intimidating, for both experienced and beginners.

I’m eager to see how they continue!

Thanks for reading!

--

--

👉 Get a Free Data Science PDF (550+ pages) with 320+ tips by subscribing to my daily newsletter today: https://bit.ly/DailyDS.