The world’s leading publication for data science, AI, and ML professionals.

Can We Use Deep Learning to Create New Programming Languages?

TransCoder form Facebook AI: Translation between high-level programming languages

Photo by Karl Pawlowicz on Unsplash
Photo by Karl Pawlowicz on Unsplash

I recently read a paper published by Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample from Facebook AI Research on 5 June, 2020. It is about a transcompiler created by neural networks. A transcompiler is basically a system that translates source codes between high-level programming languages (e.g. from C++ to Python).

Transcompilers are generally used to transfer a code base written in a deprecated language to a more recent language. Currently used transcompilers are based on hand-crafted rules which means lots of manual work and being sensitive to mistakes. Furthermore, it requires expertise in both source and target programming languages and manual modifications afterwards. Thus, the entire process becomes a tedious, time-consuming, and expensive task. According to an example given in the paper, Commonwealth Bank of Australia spent around $750 million and 5 years to convert its platform from COBOL to Java. It is just too much! Another programming language that fits better to your needs might be released in 5 years.

This inconvenient and laborious process motivated researchers of Facebook AI to create a transcompiler using neural networks. They were inspired by the advancements in natural language translation done by neural networks. One obstacle was the lack of training data. They overcame this issue by downloading GitHub repositories available on Google BigQuery. To evaluate the model, they extracted a set of parallel functions in C++, Python, and Java from GeeksforGeeks website. They also created a test set composed of 852 parallel functions. Their model, TransCoder, outperformed rule-based transcompilers by a significant margin. It translated the codes at function levels.

This paper made me think about a programming language designed by neural networks. If neural networks can translate code between high-level programming languages, they should be able to create a new one. Neural networks generate images, videos, news articles. Why not a programming language? I’m not an expert in software design or architecture but I think this paper shed light on what can be done in the future.

Deep Learning models are data-hungry. Even if we build a highly-complex, well-structured model, the performance gets as good as the data we feed to it. The amount of data is a key factor in determining the robustness and accuracy of deep learning models. The researchers also mentioned the lack of availability of data in this area. If we somehow manage to obtain lots of high-quality data, it does not seem impossible to create a new programming language with neural networks.

Photo by Artur Kraft on Unsplash
Photo by Artur Kraft on Unsplash

Crowdsourcing might be an option for data collection. For instance, CAPTCHA is used to digitize books by crowdsourcing. It was created as a challenge-response test to determine if a user is a human. We come across CAPTCHAs almost every day. We are asked to type some letters that we see on the screen. The initial purpose was to provide security. Then, a brilliant idea comes to the mind of its creators. They saw the potential that there were millions of people typing words they see on a screen. So, they started to show people parts of scanned books. Eventually, lots of books have been digitized word-by-word by millions of people. This idea was taken one step further to create Duolingo, a free website to learn a new language. Here is the entire story told by its creator, Luis von Ahn.


Let’s come back to our own topic. There are many websites that people use to learn programming languages or improve their software skills. These websites provide lots of coding challenges for practicing. With the advancements in technology and computing, more and more people want to learn programming languages. Thus, there seems to be a great potential for crowdsourcing. The coding challenges solved by people can be structured in a way that they can be used as training data for a neural network. Then the neural network iteratively improves itself to the degree where it can be used to generate a new programming language.

I may be too optimistic but it can even be superior to all existing programming languages in some sense. Since it will eventually be effortless, we can use our model to create many specific programming languages that fit best to certain tasks. There is currently a comprehensive selection of programming languages and new ones are continuously being added. They all have their pros and cons. Some of them outperform all others in certain tasks. Some are more preferred due to the simple syntax and easy-to-learn structure. However, there is not one that is superior to all others for all tasks. Thus, being able to create a specific language focusing on doing a particular task best is a great advantage.

Most of us, including myself, could not foresee the potential of deep learning until a few years ago. I have changed my way of thinking as "why not?". So, can neural networks create new programming languages? Why not? I could name it Deep++.

I would love to hear your opinion or feedback on this topic. Please feel free to drop your comment.

Thank you for reading.

References


Related Articles