The world’s leading publication for data science, AI, and ML professionals.

GitHub Copilot – A New Generation of AI Programmers

GitHub, Microsoft, and OpenAI have reached a new milestone.

ARTIFICIAL INTELLIGENCE

Photo by Nicole Wolf on Unsplash
Photo by Nicole Wolf on Unsplash

When OpenAI released GPT-3 last year, people got surprised by its ability to generate code from natural language prompts. Sharif Shameem and others excitedly shared their discoveries and soon the hype – and the worry – went through the roof. But GPT-3 was nowhere near being a great programmer. It’s a notable feat that it could understand an English text and transform it into a chunk of code, but it performed mediocrely.

OpenAI and Microsoft (now backing their projects financially) saw a very promising commercial product in GPT-3’s coding abilities and soon started to develop another language model; a programmer AI. The new model would trade-off, with respect to GPT-3, general language skills for coding abilities. This "descendent of GPT-3," as Greg Brockman calls it, is called OpenAI Codex and it’s the AI behind the latest breakthrough in the field: GitHub Copilot.

Two days ago, OpenAI, GitHub, and its parent company Microsoft presented GitHub Copilot, an AI tool – powered by OpenAI Codex – that functions as a pair programmer, helping human developers write code. It’s the first of a new generation of AI programmers that will become ubiquitous in the coming years, and the most important milestone in the field since GPT-3. It’ll open the door to radical changes in the software industry but the degree to which those changes will be beneficial or detrimental to the workforce is still unknown.


GitHub Copilot – An AI pair programmer

What it is

GitHub Copilot (from now on Copilot) is an AI tool – currently in technical review – that can make recommendations to programmers when they write code. Microsoft has already implemented it as an extension in Visual Studio Code – and will integrate it in the commercial VS product. For now, it’s only accessible to a few developers (you can try your luck here).

Copilot isn’t the first of its kind; Tabnine and Kite are two popular tools that cover the same functionality. However, Copilot stands out because it’s powered by Codex, a descendant from GPT-3 that provides a deeper understanding of the context than other assistants. Codex is similar to GPT-3, with an important distinction: It has been trained on huge amounts of coding data publicly available, from GitHub repositories and other sites.

Codex, which OpenAI will integrate into their API this summer, is a programming language model. Its abilities in this area greatly exceed those of GPT-3 – or those of GPT-J. As I argued in a previous article, GPT-like models seem to have a latent power within they release when specializing in a task. GPT-3, the jack of all trades, is amazing at many tasks. Codex is, in contrast, a master at coding.

But neither Codex nor Copilot is perfect. The tool works fine, given the testimonies of those who pre-tested it. But, as with all code written by a human, the Copilot’s code should be "tested, reviewed, and vetted." GitHub developers have created safety mechanisms to provide the best experience to the user, but the system "may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs." Copilot code may even not compile, as it doesn’t test the code it outputs. In short, it’s an early version of a new category of tools; it has flaws, but its promises greatly surpass them.

What it does

On the GitHub page, there are examples of Copilot’s amazing performance. It can complete lines of code or write whole functions, transform descriptive comments into code, autofill repetitive code, or create unit tests for your methods. It can take a few hundred lines above the current line as input, but there’s one important limitation: You can’t feed it code from other files.

It works best with Python, JavaScript, TypeScript, Ruby, and Go, but it "understands dozens of languages." However, because it fails sometimes, it may be more useful when exploring new libraries or writing in an unknown programming language. Otherwise, it could take more time to review Copilot’s code than to write it yourself.

Copilot is based on a language model similar to GPT-3, so it has adapted to the code it’s been trained on. But, as you use it, it starts "understanding" your style and adapts to you. And if you don’t like the first suggestion, you can ask it to provide top-k solutions.


Important implications of Copilot

The power of language models

GPT-2, GPT-3, the Switch transformer, LaMDA, MUM, Wu Dao 2.0… Pre-trained language models are the cake in AI right now and every major player in the field is fighting to get the biggest portion. The reason is that these models work extremely well. GPT-3 is the most popular, and rightly so. When OpenAI released the GPT-3 API, it allowed the world to take a peek at its power, and it took effect. GPT-3 was so powerful and performed so well, people even dared to call it AGI – which it isn’t.

However, because GPT-3 is a general-purpose language model, it’s fair to assume it could improve its performance in specific tasks if prepared adequately to do so. Baseline GPT-3 could be portrayed as a jack of all trades, whereas its specialized versions would be masters of their task. DALL·E was the first hint of this possibility. OpenAI trained the system on text-image pairs, which allowed DALL·E to generate images from captions and much more, excelling at visual creativity. GPT-J, a GPT-3-like system 30x smaller, could generate better code because it was trained heavily on GitHub and StackExchange data. PALMS, a method to reduce biases on GPT-3, further reinforced this hypothesis. Researchers improved GPT-3 behavior significantly by fine-tuning it on a small curated dataset.

Now, Copilot, in collaboration with Codex, has definitely proved this idea: Language models can be specialized to become masters of their trade. Codex is a programming master, what other areas could be mastered by a super-powerful specialized language model? Could they master other formal languages such as logic or math? Could they master each and every capability found on GPT-3? Could a language model write fiction at the level of Shakespeare, Cervantes, or Tolstoi? Could it compose the best hits of the summer? What’s possible isn’t clear. What’s clear is that we haven’t yet found their limits.

The future of coding

When GPT-3 was released, many people saw yet another stepping stone in the direction of the end of coding; no-code would take over sooner than we thought. Now, Copilot has raised worry to unprecedented levels. It’s designed – as its name suggests – as a cooperative partner. GitHub says "you’re the pilot," but how can we be sure this won’t change in the Future?

Its creators expect Copilot to "enable existing engineers to be more productive, reducing manual tasks and helping them focus on interesting work." However, how much time will pass until that "interesting work" can also be done by an AI faster, cheaper, and more accurately? GitHub CEO, Nat Friedman said that "the problems we spend our days solving may change. But there will always be problems for humans to solve," but maybe not the same people will be prepared to solve the new ones.

The future is unpredictable and two scenarios may arise from today: Either we create a symbiotic relationship with AI and find a place for everyone, or AI will take over many jobs. Even if it never reaches perfection, Copilot or its successors could remove the necessity of programmers, leaving only a few who would "test, review, and vet" AI’s code. We can only hope that if a displacement – or replacement – of jobs occurred, policy-makers would generate adequate responses and provide safety nets to those affected.

Unreliable usefulness?

If Copilot isn’t 100% reliable, is it useful? To what degree it is and in which situations is better to not use it? Could an inexpert programmer take advantage of it or it’s better suited for expert programmers who would take more time to review the code than writing it themselves? Copilot unreliability arises many utility-related questions.

There’s a quote from Joel Spolsky that comes in handy now: "It’s harder to read code than to write it." An inexperienced programmer may not be aware of issues with Copilot’s code whereas a veteran would prefer to write the code instead of reading what Copilot has generated. There are very few instances in which it’s clearly worth it to use Copilot: An experienced programmer who wants to try a new library/language/framework or wants to write unit tests (although those would also need to be reviewed) based on hand-made methods. The other case is an inexperienced programmer that’s starting to learn.

However, there are bigger problems than Copilot writing low-quality code. From GitHub’s FAQ: "There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns." A question worth considering is whether Copilot would reduce or increase the amount of these problems.

Legal issues

I’m not a lawyer and definitely not an expert in US law, but important questions about intellectual property, licenses, and copyright infringement were raised in a Hacker News thread opened by Nat Friedman. GitHub said in Copilot FAQs that "about 0.1% of the time" Copilot suggestions are "verbatim from the training set." This implies there’s a chance that a Copilot user may take a suggestion that contains code that’s copyrighted under a license that conflicts with the purposes of the project. This raises an important question: Who would be liable in this situation, the user, GitHub/Microsoft, or the company that owns the project?

They also mention that "you are responsible for the content you create with the assistance of GitHub Copilot. We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself." Does this imply that if we use a Copilot suggestion taken from StackOverflow that happens to be under a GPL license, we’re responsible for the infringement? How can we know whether Copilot has copied a chunk of code we can’t use? Would companies allow Copilot for their employees, or would they prohibit it for the same reason copying code from StackOverflow is often prohibited?

Another user pointed out that if GitHub didn’t limit Copilot training to "a sensible whitelist of licenses (MIT, BSD, Apache, and similar)," it would be a risk to use the tool to work on an "important/revenue-generating" project. There’s no information about whether this is true or not and therefore the issues stated above are plausible. Nat Friedman said that a debate regarding intellectual property and AI will take place over the next years, but for now, using Copilot could create more problems than solutions.


Conclusion

GitHub Copilot is predestined to change the day-to-day of many programmers all over the world. Despite the problems I’ve highlighted, the technology is an important milestone and will lead future efforts towards no-code, which Microsoft has been pursuing for some years already.

Copilot feels like an inflection point, and no one knows how events will unfold from here. Will programmers start losing their jobs a few years from now? Will we manage to find a benefiting pilot-Copilot synergy? Will it provide such an edge that companies will have to adapt to it or die? Will programmers have to trade-off professional survival for privacy?

Many questions have been arising since Tuesday and many more will keep coming. For now, answers will have to wait. GitHub Copilot is up. Let’s keep our eyes open to see where it’s driving us.


_Travel to the future with me for more content on AI, philosophy, and the cognitive sciences! Also, feel free to ask in the comments or reach out on LinkedIn or Twitter! 🙂_


Recommended reading

Understanding GPT-3 In 5 Minutes

Can’t Access GPT-3? Here’s GPT-J – Its Open-Source Cousin


Related Articles