Since the publication and dissemination of GPT-3, coding assistants like Github copilot, powered by OpenAi’s codex API have been on the radar of the machine learning community for quite a while. Recently, I came across this tool called Cogram, which seems to be a type of evolution of autocompletion, specialized for Data Science and machine learning that runs directly on Jupyter Notebooks.
In this article, I will show you how this tool works and share a little bit of my experience with it so far, generating machine learning code on Jupyter Notebooks.
Getting Started with Cogram
First things first, to get set up with Cogram you have to head out to their website, there you sign up for a free account and get access to an API token. After that all you have to do is install Cogram with:
pip install -U jupyter-cogram
Enable it as a jupyter notebook extension:
jupyter nbextension enable jupyter-cogram/main
Finally, you set up your API token with:
python -m jupyter_cogram --token YOUR_API_TOKEN
Now that you are all set up, you can start getting completions directly on your jupyter notebook.
With the most recently available version, Cogram is enabled by default. The user can turn Cogram on and off in the menu via this icon,

and can also customize Cogram, to select how many suggestions Cogram shows, and how creative Cogram is.
There are two ways to use Cogram: you can use the autosuggest mode, where Cogram makes suggestions whenever you stop typing,

or when you go to a new line. You can also use the manual completion option, triggered with the Tab key.


The user can switch between these options via the Autosuggest tick-box in the Cogram menu.
Autocompletions on Jupyter Notebook
I generated a set of completions with Cogram to test its usability, here are some of my favorites:
From writing this:
# plot sin(x) from 0 to pi
It generated this:
# plot sin(x) from 0 to pi
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()

I was happy that the first example I tried actually generated code that ran without any editing or debugging! Now, another simple example, from this:
# plot a histogram of points from a poisson distribution
It generated this:
# plot a histogram of points from a poisson distribution
import numpy as np
import matplotlib.pyplot as plt
x = np.random.poisson(5, 1000)
plt.hist(x)
plt.show()

Again, it ran without issue! Very cool so far. Let’s step it up a notch and create a fake dataset to fit a simple linear regression model:
# create a fake dataset and run a simple linear regression model
The output:
# create a fake dataset and run a simple linear regression model
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(100)
y = 2 * x + np.random.randn(100)
plt.scatter(x, y)
plt.show()

Ok, almost. The code actually ran and it makes sense in the context of what I was trying to achieve, but the model did not generate the code for the actual linear regression. Let’s see if we can change that by adding a little bit more description in our initial comment:
# write a linear regression model with sklearn
The output:
# write a linear regression model with sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
x = np.random.randn(100)
y = 2 * x + np.random.randn(100)
model = LinearRegression()
model.fit(x.reshape(-1, 1), y.reshape(-1, 1))
plt.scatter(x, y)
plt.plot(x, model.predict(x.reshape(-1, 1)))
plt.show()

Freaking cool! It actually created everything including the imports! How about plotting from a dataframe?
I wrote:
# load a csv file and plot the numerical data using the date column as the x axis
The output:
# load data.csv and plot the "data" column with the "date" column as index
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data.csv", index_col="date", parse_dates=True)
data.plot()
plt.show()

It did not plot the "data" column specifically but it did set the "date" column as index which is really cool.
Now, let’s try a slightly more advanced example, let’s write a simple perceptron from scratch using the NumPy package:
# write a perceptron from scratch in numpy
The output:
# write a perceptron from scratch in numpy
import numpy as np
class Perceptron(object):
"""Perceptron classifier.
Parameters
------------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.
Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Number of misclassifications in every epoch.
"""
def __init__(self, eta=0.01, n_iter=10):
self.eta = eta
self.n_iter = n_iter
def fit(self, X, y):
"""Fit training data.
Parameters
----------
X : {array-like}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]
Target values.
Returns
-------
self : object
"""
self.w_ = np.zeros(1 + X.shape[1])
self.errors_ = []
for _ in range(self.n_iter):
errors = 0
for xi, target in zip(X, y):
update = self.eta * (target - self.predict(xi))
self.w_[1:] += update * xi
self.w_[0] += update
errors += int(update != 0.0)
self.errors_.append(errors)
return self
def predict(self, X):
"""Return class label after unit step"""
return np.where(X >= 0.0, 1, -1)
X = np.array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
y = np.array([0, 1, 1, 0])
model = Perceptron(eta=0.1, n_iter=10)
model.fit(X,y)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-31-38e27395dd1c> in <module>
60 y = np.array([0, 1, 1, 0])
61 model = Perceptron(eta=0.1, n_iter=10)
---> 62 model.fit(X,y)
<ipython-input-31-38e27395dd1c> in fit(self, X, y)
48 update = self.eta * (target - self.predict(xi))
49 self.w_[1:] += update * xi
---> 50 self.w_[0] += update
51 errors += int(update != 0.0)
52 self.errors_.append(errors)
ValueError: setting an array element with a sequence.
Now, there is a lot to unpack here, although the code came with a few bugs and did not run right out of the box, it did write extremely compelling code that after a few edits would be ready to run.
One of the coolest things I noticed is that the model also writes the comments for the functions which is interesting given the contextual complexity that writing documentation presents. Besides that, Cogram is also context-aware (like Github copilot in VSCode), so if you write a function, variable or class, it can remember it.
Concluding thoughts on Coding Assistants for Data Science and Machine Learning
The point I would like to make is that, ever since the discussion around software 2.0 started (probably even before that) coupled with the advancement of extremely powerful language models like GPT-3, which now evolved to be the Codex engine, this style of writing software is becoming more and more ubiquitous and for a good reason, what we ultimately care about is writing solutions for problems, and not writing each line of code ourselves to solve problem x.
That does not mean we should trust language models and go wild on autocompletions, but it seems clear to me that a smart and well thought out symbiosis between man and machine might be happening in the context of code writing across platforms, Programming languages and maybe it would make sense to reflect on how you can integrate that into your own workflow.
If you liked this post, join Medium, follow, subscribe to my newsletter. Also, connect with me on Twitter, LinkedIn, and Instagram! Thanks and see you next time! 🙂
Disclaimer: This article was not sponsored nor did I receive any compensation for writing it