Can ChatGPT recommend movies with machine learning

Fun journey to test ChatGPT’s limits in the context of recommendation

Published in

Towards Data Science

6 min readApr 17, 2023

Recently I spent some time with our beloved AI overlord ChatGPT (just kidding!) probing the model and pushing its limits. I tested it on a usecase of movie recommendation. You can find the video walkthrough here.

Monolithic LLMs powered by billions of parameters, fine-tuned with RLHF has forever changed how we perceive AGI. Rise of ChatGPT, GPT-3.5 and GPT-4 have exemplified how much the horizons of the abilities and skills of language models expanded in the last few months. ChatGPT reaching 100 million users in just two months from its launch, is a testimony to how impressive the jump in AI has been.

Movie recommendation with ChatGPT

So many people are using ChatGPT in creative ways, from creating Flappy bird from scratch to building websites. Following the trend, I decided to see if ChatGPT can compute user ratings for an unseen movie, given a dataset. First I asked ChatGPT to generate a dataset.

It was swift to respond and generated a dataset as explained in the context.

I’ll be asking ChatGPT to,

Predict the user rating of Jack to the movie The Avengers

My hope is that ChatGPT uses a collaborative filtering approach to do this. One can first create a ratings matrix, use the ratings matrix to compute user similarities to Jack. And finally,

Note that I’m ignoring the users with rating 0 for The Avengers from the score computation. The following excel sheet depicts these computations. The final answer we’re looking for is 9.

Next, I posed the question as follows.

Looks like ChatGPT thinks this is supposed to be a data point, that’s currently missing in the dataset. I also tried using the “Let’s think step by step” trick. But that didn’t get ChatGPT very far.

Next, I tried using chain-of-thought reasoning to pronounce the approach that needs to be followed in order to compute the final result.

Success! This time, ChatGPT was able to follow the plan, generate the intermediate results and compute the final answer.

But hold on a second! The final result is wrong.

Problem 1: ChatGPT flunked mathematics (potentially) due to the complexity of the task

Looks like ChatGPT got the final result wrong. If you copy and paste the equation in line 2 of the last step to a calculator, you get 9, not 8.95. Moreover, unsurprisingly cosine distances are wrong too. But it’s still impressive what ChatGPT was able to do, being a language model. Let’s give the benefit of the doubt and try to show where ChatGPT stuffed up.

Unfortunately, ChatGPT couldn’t see it through. Here’s a snippet of the new response.

I couldn’t get ChatGPT to correct the mistake. But it kept admitting it made a mistake, which is a bit paradoxical. This brings us to the 2nd problem.

Problem #2: ChatGPT is sycophantic

ChatGPT is quite sycophantic and will think it’s wrong every time you point that it’s wrong. Funnily, it even thinks it’s wrong when it has the right solution at hand 😅.

Here, [0, 10, 0, 8] is the actual vector. But ChatGPT thinks it’s wrong and hallucinates something else, to get out of the predicament it’s in. It’s almost like Bing chat is the evil brother of ChatGPT.

After a bit of conversation back and forth, I wanted to test ChatGPT’s memory/attention span. So I asked,

to which ChatGPT said,

Uh-oh! If you go back the first meaningful response of ChatGPT, the rating matrix has changed. Enter one of the peskiest issues with LLMs.

Problem #3: ChatGPT hallucinates

The introduction of ChatGPT invigorated the scientific community, sparking philosophies around the place of ChatGPT; from boosting productivity to taking over the world. One idea is ChatGPT as a paradigm shift in computer programs. Throughout history, the computer program we’ve come to know and love is a deterministic set of specific instructions, by following which we can reach a desired output. ChatGPT is like a computer program but enables users to communicate using natural language, than syntax coated instructions.

However, if a variable goes out of context in a computer program, that’s a clear error. But with LLMs, they just conjure up something to fill in the gaps. This can be a deal-breaker in some contexts. Imagine you trying to resolve a billing error with ChatGPT and ChatGPT hallucinates a sign-in error. That’ll be a very confusing experience for a user.

You can find the video walkthrough of my adventure below.

New frontiers

Just because ChatGPT has some issues it’s not the end of the world! I’m still impressed how better ChatGPT is compared to a pretrain-only GPT-3. So these models will only get better.

We already have GPT-4 announced with wait-list. The technical report is already showing great promise with jaw-dropping performance boosts. For example, on grade-school mathematics problems, GPT-3.5 reaches 57.1% where GPT-4 sets the bar at 92%. Moreover, GPT-4 is reporting much better factual retrieval capabilities and less hallucination than ChatGPT.

If you’re intrigued to see GPT-4 and ChatGPT side by side from a qualitative lens, I recommend this video.

Another development is a recently introduced model that is able to perform recommendations using natural language. This model is called P5 and is showing great results standing up to state-of-the art models. For example, P5 outperforms Bert4Rec and SASRec on sequential recommendation.

Conclusion

ChatGPT is definitely not without its flaws. For example, ChatGPT failed at simple arithmetic operations, demonstrated sycophantic behaviors and hallucinated during this exercise. But this is just the beginning. ChatGPT’s successor, GPT-4 has shown some remarkable improvements over ChatGPT. Moreover, researchers are finding ways to use natural language in novel ways to solve new problems such as recommendation.

Unless otherwise noted all images are by the author