Human versus AI

Github Copilot is the recent hype in the programming world, and many articles and youtube videos show the amazing capability of Copilot. I obtained access to Github Copilot Technical Preview recently and I’m curious about how Copilot can help data scientists in Data Science projects.
What is GitHub Copilot?
GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code and suggests individual lines and whole functions instantly. [3]
Copilot has been trained on huge amounts of coding data publicly available, from GitHub repositories and other sites [2]. So basically it is a better programmer than you and me.
How to use Copilot?
Currently, Copilot technical preview is available as VS code extension and is only limited to a small group of testers for previewing the extension. You can try to sign up here. Upon success approval, you will obtain a screen as below:

Copilot works just like your Google autocomplete. Code suggestions will appear as dimmer color phrases and you will get a mouseover tab when pointing to the suggested code.

Copilot will provide you tons of code suggestions, simply press Alt + ] and Alt + [ to change the suggested code and press "Tab" to accept it. You can preview multiple suggestion codes simply pressing Ctrl+Enter and get the view like below:

Copilot is able to identify your comments and generate the function according to your comment, you can give instruction to the copilot to generate out the function just like the example below:

P.s: you can also chat with Copilot when you are bored by typing ME: and AI: for the Q&A model.

That ends our short intro on Copilot, so let’s proceed with our main question:
Can Copilot help data scientists make better data science projects?
Let’s try it!

The Experiment:
In this experiment, we will use the famous Iris dataset; our target is to use Copilot to perform exploratory data analysis and train a k-NN model with purely the suggested code. We will put rules as below:
- Use only the suggested code, fix typos and data-specific issues only.
- Every action should be accompanied by a clear
comment/command
to Copilot. - Only 3 Top code suggestions will be taken.
As of the time of writing, the Python notebook in VS code is relatively unstable with Copilot, so I will be using Streamlit as my platform. Streamlit provides a Jupyter notebook-like real-time code updates web application that can help us in exploring the data science project. For more information on Streamlit, you can read my article here.
Import the library packages:
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
💡 Data Loading
# load iris.csv into dataframe

💡 EDA with Copilot
# print the dataframe column names and shape
It impressed me that Copilot auto-understands the printing mechanism in Streamlit which uses st.write() instead of print() as streamlit is a relatively new package in python.
Next, I try with:
# create a scatter plot of the petal length and petal width using plotly express
And this is what I get, looks like Copilot is not clever enough to understand the context inside the data frame 😂 :

Next, I try with exact naming, and a nice exact graph is obtained:
# create a scatter plot of the petalLengthCm with SepalLengthCm using plotly express

💡 Modeling with scikit-learn:
Next for creating a test and train dataset, I write this:
# splitting the data into training and testing sets (80:20)
and these are the suggestion I get back:
Impressive! Copilot even knows which one is my target class and writes the full code for me, what I need to do is just select the suggestion!
The full code suggestion return is as below:

Next, I try my luck with this command:
# check for optimal K value using testing set
And out of my expectation, Copilot can return me this code:
That’s tons of time saved in coding; Copilot even helps you plot a chart in the end. Well, the chart didn’t work out, so I have to modify the code a bit on my end using the list is created. But it still saves me lots of time going to stack overflow checking for codes.
Out of my curiosity, I asked Copilot, "What is the optimal K value?"

The copilot returns me the answer without the need to plot the graph 😲😲
So this inspired my next command, I want:
# create a classifier using the optimal K value
and then, I just press enter and accept the suggested comment and codes to proceed. here is my resulted code:
Note that I only type 1 command, and the rest is suggested by Copilot.

Out of 5 suggested codes, 3 work perfectly and 2 suggestions: _metrics.f1score and _metrics.precisionscore doesn’t work out.
That’s the end of my simple code testing with Copilot. I had published the suggested in Github, feel free to see it.
Final thought:
In this article, I demonstrated how copilot can help in the data science process and a few mistakes were done by Copilot, but the advantage of using it is more. Another concern is that the dataset I used was the Iris dataset, so it might work less effectively in a bigger dataset.
The new paradigm of Programming is coming, instead of searching in Q&A websites such as StackOverflow, Copilot will save most of your googling time and give you multiple solutions directly. I guess it will reduce the reliance on StackOverflow and Google for programming-related questions in the future.
Well, at the current stages, our job is still secured as Copilot still needs some basic knowledge to execute, such as guiding the direction of the projects and telling Copilot what to do.
In my opinion, Copilot definitely will make you a better data scientist currently. But will it take over your job as a data scientist in the future? With the feedbacks and vast amount of data input into Copilot, it will definitely become a better AI in the future, and who knows will it take over the programmer and data scientist job? 🤔🤔
Lastly, I’m still in #TeamHumanity: with our creativity, we will take over the control toward AI, and use AI for our betterment. Thank you very much for reading my articles.
Side notes:
Here are some of my articles, hope you like them too: