The world’s leading publication for data science, AI, and ML professionals.

Can ChatGPT Write Better SQL than a Data Analyst?

A LeetCode SQL Competition Between ChatGPT and Me

Photo by DeepMind on Unsplash
Photo by DeepMind on Unsplash

I tried ChatGPT, a variant of the GPT-3 language model that is specifically designed for generating human-like text in a conversational context. And of course, like most of us, I wondered: can an AI do my job? And can it do it better than me?

I have 2 years of experience working as a Data Analyst and an Analytics Engineer. According to BBC Science Focus, ChatGPT has ingested 570 GB of data. So who writes better SQL?

Let’s play!

This game will be based on 3 LeetCode SQL challenges (one easy, two medium). I will write every solution first, and then send the exercise to ChatGPT and see which solution works best.

I will provide links to every challenge so you can try as well to beat ChatGPT.

Challenge 1 (Easy)

This challenge is called Customer placing the largest number of orders.

Screenshot by Author
Screenshot by Author

Here is the query I wrote:

WITH layer_1 AS (
  SELECT 
    customer_number, COUNT(DISTINCT order_number) AS order_number
  FROM orders
  GROUP BY customer_number
)
SELECT customer_number 
FROM layer_1
ORDER BY order_number DESC 
LIMIT 1

It passed with a correct runtime:

Screenshot by Author
Screenshot by Author

Now let’s see how ChatGPT performs on this one.

Here is what ChatGPT answered:

Screenshot by Author
Screenshot by Author

ChatGPT even explains what it did. I don’t find the query very readable – one of the reasons why I love common table expressions is readability – but let’s see how it performs.

Impressively, it works, but it is slower than my result. Although I am glad to do better than ChatGPT, I would have loved to know how I can improve this query.

Challenge 2 (Medium)

The next challenge is called Tree Node.

Screenshot by Author
Screenshot by Author

The first query I wrote was this one:

# Write your MySQL query statement below
WITH l1 AS (
SELECT 
    t.id, 
    c.id AS c_id,
    t.p_id
FROM Tree t
LEFT JOIN Tree c
ON c.p_id = t.id
), 
l2 AS (
SELECT 
    id, 
    COUNT(DISTINCT c_id) AS nb_childrens, 
    COUNT(DISTINCT p_id) AS nb_parents
FROM l1
GROUP BY id
)
SELECT id, 
    CASE 
        WHEN nb_childrens >0 AND nb_parents >0 THEN "Inner"
        WHEN nb_childrens > 0 THEN "Root"
        ELSE "Leaf"
    END AS type
FROM l2

I ran it before submitting it, and it got the wrong result…

Screenshot by Author
Screenshot by Author

Turns out I hadn’t paid enough attention to the examples, especially the second one:

Screenshot by Author
Screenshot by Author

When a node is both a leaf and a root, it should be output as a root. I changed the order of my CASE WHEN to this:

CASE 
        WHEN nb_childrens >0 AND nb_parents >0 THEN "Inner"
        WHEN nb_parents > 0 THEN "Leaf"
        ELSE "Root"
    END

And this time, it passed!

My query got an average result:

Screenshot by Author
Screenshot by Author

Time for ChatGPT to play the game!

ChatGPT read all the examples and did not make my mistake on nodes that are both a root and a leaf:

Screenshot by Author
Screenshot by Author

But it still gave me a wrong answer:

Screenshot by Author
Screenshot by Author

ChatGPT didn’t return one row per id.

So I decided to give ChatGPT a hint:

Screenshot by Author
Screenshot by Author

And it was able to correct it!

Screenshot by Author
Screenshot by Author

We both made a mistake in this challenge and both were able to fix it (although I had to give a hint to ChatGPT). I had a slightly better result in terms of runtime, so point goes to me but it was close!

Challenge 3 (Medium)

The last challenge is called Capital Gain/Loss.

Screenshot by Author
Screenshot by Author

Here is the query I wrote:

SELECT 
  stock_name, 
  SUM(
    CASE 
      WHEN operation = "Buy" THEN -1*price 
      ELSE price
    END
    ) 
  AS  capital_gain_loss
FROM Stocks
GROUP BY stock_name
Screenshot by Author
Screenshot by Author

It passed but the runtime is pretty bad, more than 90% of players did better than me on this one.

Let’s see if ChatGPT is one of them.

Screenshot by Author
Screenshot by Author

Let’s try this solution:

Screenshot by Author
Screenshot by Author

Again, it gets the wrong result. ChatGPT joins one buy with every future sell and not only the corresponding one, hence its solution only works if there is just one pair of Buy/Sell operations.

I tried to tell ChatGPT to fix its mistake, without giving it a hint:

Screenshot by Author
Screenshot by Author

Unfortunately, it gives me exactly the same query. So I complained:

Screenshot by Author
Screenshot by Author
Screenshot by Author
Screenshot by Author

This time, ChatGPT and I used a similar structure and the query passed. However, I am not sure that ChatGPT really understood what it did wrong in the first place.

Screenshot by Author
Screenshot by Author

The runtime of ChatGPT is slightly worse than mine.

Results


  Challenge    ChatGPT πŸ€–      Data Analyst πŸ‘©    β€πŸ’»              Winner        
 ----------- --------------- --------------------- --------------------- 
          1    βœ… (22%)         βœ… (62%)               Data Analyst πŸ‘©    β€πŸ’»      
          2    ❌ βœ… (36%)      ❌ βœ… (54%)             Data Analyst πŸ‘©    β€πŸ’»       
          3    ❌ ❌ βœ… (8%)    βœ… (5%)                 Data Analyst πŸ‘©    β€πŸ’»    

I think it’s fair to say that I "won" this SQL challenge against ChatGPT. I was still impressed by its abilities, and amazed that it is able to correct its mistake! ChatGPT definitely beats me on speed; it writes valid SQL syntax in a few seconds, whereas I need a few minutes.

But it still got 50% of it wrong. And even if it had completed successfully every challenge, I wouldn’t worry about my job. Stakeholders never come to analysts with such a well-defined request and output examples. They come with a business question, and we have to think about the best way to answer it with available data. Can ChatGPT do that?

I asked ChatGPT its opinion on the matter:

Screenshot Author
Screenshot Author

That sums it all up!

I hope you enjoyed this article! If you did, please follow me for more content on SQL and Data Analysis.


Related Articles