The world’s leading publication for data science, AI, and ML professionals.

How Good is GPT-3 at SEO?

What are the practical and broader societal concerns of using automated AI-generated content?

Photo by Possessed Photography on Unsplash
Photo by Possessed Photography on Unsplash

Robots are writing articles. Robots are writing code. What an exciting time to be alive!

GPT-3 is all the rage right now. It’s a generative pre-trained deep learning model to produce human-like text. It is created by OpenAI – a laboratory, specializing in AI research.

As a writer, a digital marketer, and an aspiring data scientist I am excited about the possibilities this model could help create. So, I put the model to the test.

I want to show you the results of Semrush‘s SEO Writing Tool assessment of a text, generated by GPT-3. Then, we will discuss the implications of this model hitting the mainstream stage – its limitations, and potential impact on society and the planet.


‘A robot wrote this article’

Recently, OpenAI announced that users could request access to its user-friendly GPT-3 API – a "Machine Learning toolset" – to help OpenAI "explore the strengths and limits" of this new technology.

As my access to it is pending, I used a service, already applying the model in their application, called shortly.ai.

Although praised as the most sophisticated writer on the market, shortly.ai is currently quite pricy, so I am not in any way endorsing a purchase. Not just yet, anyway, before OpenAI releases the alpha.

Getting started is pretty easy. You will be prompted to choose an article type (story or blog/article), then write a title and a few sentences to prompt the model to start writing.

Then, voila! GPT-3 did my job in less than 3 minutes.

Check out the example article it wrote for yourself.

10 Ways To Increase Your Productivity

Now, let’s get analyzing.


Assessment

Semrush’s SEO Writing Assistant is a tool aimed at content optimization. It uses analytics from the top 10 rivals in Google search results for the target keywords you input. It provides recommendations that improve the SEO friendliness, readability, and tone of voice of the copy it analyzes. It can be used as a Google Docs add-on.

Here is what it had to say about the AI-written article.

6/10 – Mediocre. Ouch. Let’s see why.

screen capture from author's google docs (image by author)
screen capture from author’s google docs (image by author)

Readability

The readability score is one of the lowest. The reason is the text is written for a five-year-old to understand. Ten points for clarity, to be fair! Sentences are simple and easy to understand, which is not necessarily always a bad thing, but some variety is needed for good writing.

Definitely, something missing here. The thing is, for written content to really be successful, a connection with the audience is needed. This is often made by anecdotes by the author, even random tangents, personal experience sharing, critical discussions, thoughts.

At present, the article can at the most be turned into a wikiHow. It is an idea dump, a rough draft, needing a few critical editing iterations before it can be published to any respectable publication.

SEO section - screen capture from author's google docs (image by author)
SEO section – screen capture from author’s google docs (image by author)

SEO

The SEO section consists of keyword and link recommendations, as well as alt tag checks.

As our article does not have any images, there are no missing alt-tag alerts, however, we are reminded that articles with images are more appealing to readers.

No links are added either. This was something I didn’t immediately notice when reading the article, however, I do consider it quite important. GPT-3 writes decent, logical content. Giving credit to its ideas though is nowhere to be found.

Again, we go back to the need for thorough editing – preferably such as critically assessing the claims made, using a variety of authoritative sources in the process.

Finally, there aren’t many of the related keywords used. Granted, some of them are not relevant, yet the target keyword provided (i.e. productivity) is quite a broad one to target.

Perhaps this is something that the model would have done better in, should the input sentences have provided more context to the topic, how it should be tackled, and the important primary and secondary keywords.

Tone of voice section - screen capture from author's google docs (image by author)
Tone of voice section – screen capture from author’s google docs (image by author)

Tone of Voice

Now, on to the tone of voice. The tone of voice is how the message is presented, including the choice of words, order, rhythm, and pace.

Semrush scores the tone of voice on a five-point scale:

  • Very casual ("That’s the most stupid suggestion EVER.")
  • Somewhat casual ("That’s not really clever.")
  • Neutral ("Nobody has asked for this advice.")
  • Somewhat formal ("Recommendations were not required.")
  • Very formal ("Given recommendations were unsolicited and undesirable, and will not be accepted.").

The Tone of Voice score uses a machine-learning algorithm, based on scientifically proven research and thousands of human-rated texts – says Semrush.

The generated text is somewhat formal, yet totally inconsistent throughout.

Originality

Not really surprising, but the text is 100% original.

Originality section - screen capture from author's google docs (image by author)
Originality section – screen capture from author’s google docs (image by author)

The GPT-3 model uses natural language processing to make sense of a diverse corpus of text, which it has been pre-trained on via many different datasets. This allows for natural language understanding, without the need for time-intensive data labeling. Any text it generates is original and new.

The Verdict

Impressive, impressive, impressive. Nonetheless, humans are absolutely needed before applying this model anywhere in the real world.

With human-assisted machine learning, inherently human elements, such as emotion and nuanced visual cues, can be implemented to improve a model’s output. Using a machine-learning algorithm by itself simply doesn’t work, at least not for long-form content. Not yet.


Broader impact and concerns about wide-spread GPT-3 implementation

First of all, you might be wondering – what does Google think of this programmatic approach to generating content?

And yes – you’d likely be correct in supposing they are currently against it. At the time of writing, Google’s Quality Guidelines say:

Automatically generated – or "auto-generated" – content is content that’s been generated programmatically. In cases where it is intended to manipulate search rankings and not help users, Google may take actions on such content. These include, but are not limited to: (…)

Text generated through automated processes, such as Markov chains (…)

Though, they have said that at some point in the future, there might be revisions of these guidelines.

More worrying to me are the societal concerns about the broader implementation of the GPT-3 model, highlighted in the study of Brown and fellow scholars from the OpenAI lab.

The authors of the study highlight a number of concerning applications of this model with the aim of endorsing research and legislative efforts in mitigating risks:

  • malicious misuse —Using GPT-3 for misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing, and social engineering pretexting
  • endorsing and sustaining existing stereotypes about race, bias, and representation – when present in the training data, these can lead models to generate stereotyped or prejudiced content
  • sustaining stereotypes about men at work – authors report that 83% of the 388 occupations they tested were more likely to be followed by a male identifier by GPT-3
  • enforcing existing stereotypes about race – when prompted to talk about race, the detected sentiment of the model, when discussing ‘Black’ was consistently low (Asian – high; White – neutral)
  • enforcing existing stereotypes about religion – as the models make associations with religious terms that indicate some propensity to reflect how these terms are sometimes presented in the world when prompted to talk about religion words such as violent, terrorism, and terrorist co-occurred at a greater rate with Islam than with other religions.

The authors also greatly highlight the energy use needed for training the model, as well as the potential impact widespread development of such models can have for the planet and its resources.


The Takeaway

GPT-3 is exciting, new, and shiny; however – it is far from perfect.

Humans are very much still needed to help create beautiful content that is fair, robust, critical, deep, and engaging.

While the benefits of such models outweigh the costs, at least in testing, the impact of widespread GPT-3 use can be detrimental in the fight for equality and justice for misrepresented and under-represented communities.

There is a deep paradox in the implementation of this model mainstream, which is glaringly apparent in the discussion of these limitations. As with any machine learning model, GPT-3 is trained on existing pieces of text. Apparently, while remarkable, it is still fallible to what we uphold as a good writing standard. If implemented widely (without editorial process), it will lead to poorer quality text and perpetuation of bias, which will ultimately stifle its development.

Instead, what appears to be needed is side-by-side work to improve this model and assist it to advance with the changing times. How? By writing better, by editing its output, and by reporting on its flaws. This appears the only way us, mortals, can help researchers advance this model dynamically.

If you do want to go ahead and help yourself with content generation using GPT-3, go through this handy checklist, before publishing your content:

  1. Is the language too simple? – Add more complex sentences, if needed.
  2. Are the words used basic? – Replace with advanced words, whenever possible.
  3. Add images, and insert alt-tags for accessibility.
  4. Critically assess the claims made throughout – Provide links to authoritative sources, whenever possible.
  5. Are there enough related keywords used? Does the article cohesively address the topic? – Add more related keywords, if needed. Use tools such as answerthepublic.com to add answers to pressing questions.
  6. Is the tone of voice, in which the text is written, consistent? – The best tone for your content will depend on your users, your message, and your brand.
  7. Does the text read like something you or your brand would say? – Add, remove, edit sections as appropriate to customize your text, as necessary.
  8. Is the generated text biased or does it support existing stereotypes about gender, race, or other representation issues? – Being aware of the model’s limitations can assist you in editing out inherent harmful biases.

Related Articles