Natural Language Processing

What GPT-4 Brings to the AI Table

A language model and more

Edozie Onyearugbulem
Towards Data Science
7 min readApr 14, 2023

--

Image from Unsplash

The long-awaited release of the latest Generative Pre-trained Transformers (GPT) model has finally come. The fourth release of OpenAI’s GPT model has seen some improvements from its previous versions, in addition to some extended features. GPT-4, like its predecessors, was trained and fine-tuned on a corpus of text using semi-supervised training. The semi-supervised training used in GPT models is done in a two-step process: an unsupervised generative pre-training and a supervised discriminative fine-tuning. These training steps helped to circumvent the language understanding barriers that other language models faced due to poorly annotated data.

How GPT-4 got this far

OpenAI released GPT-4 on 14th March, 2023, nearly five years after the initial lunch of GPT-1. There have been some improvements in the speed, understanding and reasoning of these models with each new release. Much of the improvements on these models could be attributed to the amount of data used in the training process, the robustness of the model and the new advances in computing devices. GPT-1 had access to barely 4.5GB of text from BookCorpus during training. GPT-1 model had a parameter size of 117 million — which was by far massive compared to other language models existing at the time of its release. GPT-1 outperformed other language models in the different tasks it was fine-tuned on. These tasks were on natural language inference, question answering, semantic similarity and classification tasks.

Those who were still uncertain about the possibility of a model surpassing GPT-1 were blown away by the numbers GPT-2 had on its release. The parameter size and the text size used in training were roughly ten times the size seen on GPT-1. The size of GPT-2 wasn’t the only new addition. In contrast to GPT-1, OpenAI removed the need for an additional fine-tuning step for specific tasks. Few shots learning was used to ensure that GPT-2 was able to attribute meaning and context to words without needing to encounter the words multiple times.

Just like GPT-2, GPT-3 and other subsequent language models do not require additional fine-tuning on specific tasks. The 175 billion parameter model of GPT-3 was trained on 570GB of text from Common Crawl, Web Text, English Wikipedia and some books corporal. The language understanding and reasoning of GPT-3 were profound, and further improvements led to the development of ChatGPT, an interactive dialogue API. OpenAI developed ChatGPT to enable a web-based dialogue environment for users to have a first-hand experience of the capabilities of the extended GPT-3 by making the language model converse and respond to users based on inputs from the user. A user can ask a question or request detailed information about just any topic within the training scope of the model. OpenAI furthermore regulated the extent of information their models could provide. There was a bit of extra care in answers relating to prompts involving crime, weapons, adult content, etc.

Exciting features of GPT-4

Each new release of GPT comes with a set of features that would have seemed impossible in the past. ChatGPT impressed users with its level of reasoning and comprehension. Users were able to get accurate responses to their queries on any topic, as long as the subject matter was part of the text ChatGPT was trained on. There have been cases where ChatGPT struggled to respond to queries on the events that occurred after when the model was trained. The difficulty in understanding novel topics should be expected since NLP models regurgitate texts and try to map entities within time and space of appearance to suit the desired context. Therefore, only topics existing in the dataset it was trained on can be recalled, and it would be quite ambitious to generalize on new topics.

Not only was the reasoning of the GPT-3 model relatively limited, but the model was unimodal. Only sequences of texts can be processed by this model. The latest release of GPT comes with improvements on the previous release. Due to its higher level of reasoning, GPT-4 models can make better estimates of sentence context and make general understanding based on this context. Based on the glimpse of the capabilities of this new model, other new features are as follows;

  • An increase in its word limit, with a word limit size of 25,000 compared to the 3,000-word limit on ChatGPT. GPT-4 has an increased context window, with a size of 8,129 and 32,768 tokens compared to 4,096 and 2,049 tokens on GPT-3.
  • Improvements in reasoning and understanding. Texts are well understood and, better reasoning is performed on texts.
  • GPT-4 is multi-modal. It accepts text inputs as well as images. GPT-4 recognizes and understands an image’s contents and can make logical deductions from the image with human-level accuracy.
  • Texts generated on GPT-4 are more difficult to be flagged as machine-generated text. The texts have been more human-generated and make use of sentence features like emojis to make texts feel more personal and instill a bit of emotion in the text.
  • Lastly, I would like to single out the new dynamic logo that comes with GPT-4. The logo shows how variable this model is and the dynamism in its potential use cases. I think the logo has to be one of the best identities given to a model.

Truths and myths

Visual representation of the size of GPT-4

At some point during the wait for the release of GPT-4, this picture was in circulation on Twitter. The image is a visual representation of the rumoured size of GPT-4. The image shows a considerable increase in the size of the parameters of the new model compared to the size of the parameters used in ChatGPT. While the representation communicated by this image might sound groundbreaking, it might not be entirely true. Even OpenAI’s CEO has debunked the rumours about the size of the model. The official documentation of the architecture and the size of the model parameters used in training the multi-modal language model has not been released. We can’t really tell if the approach used in creating this model was by scaling the past models or some new approach. Some AI experts argue that scaling wouldn’t provide the much-needed General Intelligence the AI world is striving towards.

OpenAI presented the big strengths of GPT-4 in text generation, but have we bothered to ask how good the generated texts are compared to some standard exams? GPT-4, while performing quite well in some exams, faltered in exams that required higher level of reasoning. The technical report released by Open AI showed that GPT-4 was always in the 54th percentile of the Graduate Record Examination (GRE) Writing for the two versions of GPT-4 that was released¹. This exam is one of many exams that tests the reasoning and writing abilities of a graduate. It can be said that the text generation from GPT-4 is barely as good as a university graduate, which isn’t bad for a “computer”. We can also say that this language model doesn’t like math, or rather, it doesn’t do well in calculus. It performed in the 43rd — 59th percentile of the AP Calculus BC exam, which is quite low compared to the high percentile scores seen in the Biology, History, English, Chemistry, Psychology and Statistics counterparts of the same exam board. The model falters with increasing levels of difficulty. Humans are still at the top echelon of thinking for the time being.

Ever cared to wonder how well these language models perform in coding? GPT-4 coding abilities were checked on some Leetcode tasks. The general performance on the easy tasks was quite good, but there’s a constant decline in its performance with an increase in difficulty in the tasks. It is also worth noting that the overall score of GPT-4 on Leetcode tasks is almost similar to that of GPT-3. OpenAI definitely didn’t do better this time or they were possibly not trying to turn GPT models into the next Github Copilot. Imagine a computer performing better than an average programmer on interview coding questions. Crazy!

While some features didn’t see many improvements compared to the predecessor model, it’s worth noting how well the model performs on other tasks.

Conclusion

This fourth release of GPT has shown that there isn’t any limit on the scope of language models since these models are not multi-modal and can accept inputs other than texts. This could be seen as a harbinger of more advanced features in versions to come. We probably could have a language model performing as well or even better than computer vision models in image recognition tasks with the capabilities shown by GPT-4 image understanding. We are gradually moving towards General Artificial Intelligence. It’s still a long way there, but we clearly have a direction and a sense of where we are heading.

[1]: OpenAI. (March 16 2023). GPT-4 Technical Report https://cdn.openai.com/papers/gpt-4.pdf

Thank you!

If you liked my article, please do well to follow me so that you would get notifications whenever I publish a story. I would be publishing more articles in this space. Wish you the best in your endeavors.

--

--

I'm a data guy. I write contents about data science, machine learning and other data related topics.