The world’s leading publication for data science, AI, and ML professionals.

Data scientists aren’t going away anytime soon

A counterargument of programmers getting obsolete given the advent of GPT by OpenAI

I spent the last week reading about GPT-x models and how revolutionary they are going to be. I spent hours in front of the screen looking at the extraordinary results that people have gotten in the field of poetry, programming, bot conversation, fiction, and whatnot. Despite being excited, I read articles and news items with a pinch of salt that claimed that GPT models will end the era of coders and programmers, an article even concocted a term – cargo-cult programmers—and prophesied the sad demise of those who belong to the cohort.

A story in frames! The image is produced employing context-free grammar, by means of which many phenomena in our universe are governed such as the growth of trees. The fact that nature can take something simple and can produce something complicated such as humans and animals is an indication that we are quite far away from that process for now as far as teaching the machines is concerned (Image by Author)
A story in frames! The image is produced employing context-free grammar, by means of which many phenomena in our universe are governed such as the growth of trees. The fact that nature can take something simple and can produce something complicated such as humans and animals is an indication that we are quite far away from that process for now as far as teaching the machines is concerned (Image by Author)

With great powers come great responsibility; if you are a Data Scientist then it is imperative that you employ the empirical scientific method to acquire knowledge and reach conclusion. Otherwise, what’s even the point!!! The articles had already piqued my interest and I decided to train a model on my humble machine. I am still in the waiting list for GPT-3, so I ended up using GPT-2. You can read the humorous results of the Philosopher that I trained in this article.

It was a lot of fun training the AI-Philosopher (some people need to understand the definition of fun and get a life), I don’t think I did a great job in training but the entire exercise provided a window to the world of such powerful models and what they are capable of.

Yes, I admire the sheer brute force behind them and the results they are producing but it is premature to call them the messenger of deaths for the programmers (not in the literal sense of course).

These are my opinions based on my experience:

Scalability Issues?

GPT-3 is massive (175 billion params), GPT-2 (1.5 billion) isn’t small either, so there are two factors to consider regarding the real-time usage of the models:

  1. How accessible are they to programmers?
  2. How easy they are to train and how well do they perform on deployment in production mode?

The second factor was the reason that I was quite headstrong about the idea of training the model on my local machine and not use Colab or Paperspace. My machine is a humble Macpro-16" without GPU support (Thanks Nvidia for CUDA! Not!!) Although it was quite straightforward to tweak the hyperparameters, the time it took increased exponentially with increasing epochs. Roughly, it took 25 min for 200 step training.

OpenAI has acknowledged this issue of scalability and the linked problem of larger companies’ dominance and benefits from such tech.

What do Data Scientists do exactly?

It will be reductionist propaganda if you say

Data Scientist = Programmer + some bit of Statistician + some bit of Data Modeler

Well yes, all these are the typical roles and responsibilities that you find in the job descriptions but what about the problem solving and business aspect?

Understanding the problem statement, crystallising it, formulating a strategy to solve it are much harder problems that can’t be swept under the carpet.

You can ask a sophisticated model to design a blue box but why the blue box(or any coloured box for that reason) and at what stage of the solution cycle are important steps.

Do we trust machines yet?

Ask yourself, are you ready for a computer to handle your day to day life? I had attended a workshop for semioticians and the most common complaint customers had was ‘I want to speak to a real person‘. The North Face ended up changing its customer rep from chatbots to humans and they saw a positive increase in customer satisfaction.

This tells us about the proclivities of humans and how the same is indirectly applicable to programmers as well.

Even though humans are not infallible, there is an unrealistic expectation from machines to be one and that’s why a lot of work needs to be done before humans and machines live harmoniously.

Isn’t automation already there?

Most of the examples that I came across in which GPT was producing code was like an automated process. You tell the machine what you need to accomplish and the machine does that for you. So far so good!

Automation has already been around the corner and has been trying to make a space for itself unsuccessfully till now. Yes, there are organisations that boast of automating the modelling process but more often than not even linear regression is carried out by humans at the starting level; once a stable state is achieved, automation kicks in.

Conclusion

  1. Although GPT models are incredible and it is great that humans are able to provide such degree of sophistication to the machine, these models will make data scientists and programmers obsolete is quite a premature thing to say. Yes, maybe in the long run that could be possible but in next 5 years— quite unlikely.
  2. Heck! we are still struggling with cat and dog identification issues and the models that are able to achieve 97% accuracy are quite heavy. Same is the case with GPT models, using them is resource-intensive.
  3. I agree that these models have their place and can definitely replace a lot of human effort in many areas of problem-solving but we need to remember that complicated problems have many more components involved such as thinking, clarifying, and structuring, before solutions are generated. In those scenarios, human intelligence still has an upper hand for now 🙂

Related Articles