Advent of code: SQL + BigQuery

Tens of thousands of coders are working out the Advent of Code challenges in their favorite programming languages. What if I could use this opportunity to highlight the power of SQL and the latest BigQuery features?

Felipe Hoffa
5 min readDec 5, 2019

--

2020 update: With Snowflake

Check my 2020 Advent of Code with SQL thread:

And the GitHub repo:

Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels.

What if you could use SQL and BigQuery to write a compiler, intersect paths, recursively run calculations, and crack passwords? #adventofcode2019, here we go!

Problem 1: Recursive iterations

Problem, BigQuery solution

Running a recursive function in BigQuery hasn’t been easy — but now we can do loops and general scripting. This is the main cycle I used to solve the second part of the problem :

LOOP
IF (SELECT SUM(x) FROM UNNEST(extra) x)<=0 THEN
LEAVE;
END IF;
SET weight= weight+(SELECT SUM(x) FROM UNNEST(extra) x);
SET extra = (SELECT ARRAY_AGG(fuel(x)) x FROM UNNEST(extra) x);
END LOOP;

And a handy SQL UDF:

CREATE TEMP FUNCTION fuel(x ANY TYPE) AS (
GREATEST(0,FLOOR(x/3)-2)
);

The most interesting problem I had to solve here: I wasn’t able to have a cursor to go row by row, calculating the fuel for each component. So I had to figure out a way of working with every rocket component (input line) in parallel — which made the query way more efficient.

Problem 2: Data as code

Problem, BigQuery solution

Can I write a compiler using SQL and BigQuery? Are we Turing complete yet?

Again, using a scripting LOOP allows me to iterate over an input array — which defines code to be executed as read from the loop:

LOOP
IF(input[OFFSET(code_pos)]=99)
THEN LEAVE;
END IF;
SET input = (SELECT parse(input, code_pos));
SET code_pos = code_pos+ 4;
END LOOP;

The most interesting part is how I had to useARRAY_AGG() and CASE to rewrite the contents of each array as instructed:

SELECT ARRAY_AGG(k) FROM (
SELECT
CASE n
WHEN x[OFFSET(off+3)] THEN
CASE x[OFFSET(off+0)]
WHEN 1 THEN x[OFFSET(x[OFFSET(off+1)])]+x[OFFSET(x[OFFSET(off+2)])]
WHEN 2 THEN x[OFFSET(x[OFFSET(off+1)])]*x[OFFSET(x[OFFSET(off+2)])]
END
ELSE xx
END k
FROM UNNEST(x) xx WITH OFFSET n
ORDER BY n

And then the most difficult problem I had to face — how to do this 99*99 times to find the solution to part 2? It’s not pretty, but it involves arrays of arrays and doing a slow search through the (luckily) linear space.

Warning: It takes more than 2 minutes to search through 200 possible solutions. But since the particular statement is linear, you can jump to the right result.

Problem 3: Where will our paths meet

Problem, BigQuery solution

This is a fun map traversal problem — for the solution I used the ability to generate SQL arrays detailing each step:

FROM [...], UNNEST(GENERATE_ARRAY(x-move.x, x, IF(move.x>0,1,-1))) arr_x WITH offset n1, UNNEST(GENERATE_ARRAY(y-move.y, y, IF(move.y>0,1,-1))) arr_y WITH OFFSET n2

The join between paths to find intersections is pretty elegant:

USING(arr_x, arr_y)

A handy function parses the input and generates the 2-d movement instructions:

CREATE TEMP FUNCTION parse(x ANY TYPE) AS (
CASE REGEXP_EXTRACT(x, '^(.)')
WHEN 'D' THEN STRUCT(0 AS x,-parse_n(x) AS y)
WHEN 'R' THEN (parse_n(x),0)
WHEN 'U' THEN (0,parse_n(x))
WHEN 'L' THEN (-parse_n(x),0)
END
);

I’m proud of the ability to re-factor the path generating code, to make the solution easy to look at:

SELECT ABS(arr_x) + ABS(arr_y) distance, (a.rn + b.rn) wire
FROM (
SELECT * FROM UNNEST(route('L996,D167,R633,...'))
) a
JOIN (
SELECT * FROM UNNEST(route('R995,U982,R941,...'))
) b
USING(arr_x, arr_y)
WHERE a.rn>0 AND b.rn>0
ORDER BY wire
LIMIT 1

Problem 4: The password is…

Problem, BigQuery solution

The code here ended up being pretty short! I wish I had written less manual array positions, but doing so took me less time that figuring the general case:

SELECT * 
FROM (
SELECT CAST(FORMAT('%i%i%i%i%i%i',n0, n1, n2, n3, n4, n5) AS INT64) number
FROM (
SELECT n0, n1, n2, n3, n4, n5
FROM UNNEST(GENERATE_ARRAY(1, 5)) n0
, UNNEST(GENERATE_ARRAY(1, 9)) n1
, UNNEST(GENERATE_ARRAY(1, 9)) n2
, UNNEST(GENERATE_ARRAY(1, 9)) n3
, UNNEST(GENERATE_ARRAY(1, 9)) n4
, UNNEST(GENERATE_ARRAY(1, 9)) n5
WHERE n1>=n0
AND n2>=n1
AND n3>=n2
AND n4>=n3
AND n5>=n4
AND (
(n1=n0 AND n1!=n2)
OR (n2=n1 AND n1!=n0 AND n2!=n3)
OR (n3=n2 AND n2!=n1 AND n3!=n4)
OR (n4=n3 AND n3!=n2 AND n4!=n5)
OR (n5=n4 AND n4!=n3)
)
)
)
WHERE number BETWEEN 109165 AND 576723

As you can see GENERATE_ARRAY() helped create the space to look for a solution.

Day 5

As I write these thoughts, the problem still hasn’t been posted…

Well… now it has, but I’m going home. Can you solve it in the meantime?

Update: The problem is a continuation of day 2. Building this “Turing machine” was fun on day 2, but I don’t feel like doubling down on it.

Day 6: Graph traversal

Problem, BigQuery solution

Going through the graph didn’t take much effort with the ability to script — but it highlights that BigQuery can be slow when a function has to go through many iterations. In this case it took ~20 minutes to process 547 statements. As a reminder: BigQuery is incredible at processing Terabytes of data, but not necessarily at solving general computing problems.

LOOP
SET steps = steps+1
;
CREATE OR REPLACE TEMP TABLE planets AS
SELECT DISTINCT planet
FROM (
SELECT origin planet FROM t1 WHERE dest IN (SELECT planet FROM planets)
UNION ALL
SELECT dest planet FROM t1 WHERE origin IN (SELECT planet FROM planets)
)
;
IF 'SAN' IN (SELECT * FROM planets )
THEN LEAVE;
END IF;
END LOOP

Now, if this problem wasn’t a one-off, I wouldn’t traverse the graph each time, but pre-compute a better structure with all paths to instantly find the intersections (since this is a tree, not a graph).

Final thoughts

Would I recommend SQL as a general programming problem tool?

No!

… but this has been a fun way for me to try the limits of the SQL and BigQuery. Scripting is a new feature, and I’m still getting used to its syntax and limitations. So far, so good.

Take the Advent of Code challenge!

As a bonus: Use these tricks to impress your friends!

Thanks Aja Hammerly, Mano Marks, Ben Bleything, … for the early encouragement.

(You can join the BigQuery SQL leaderboard with code 763369-9f36786d)

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.