Simon's Blog

The JOIN that goes nowhere

April 02, 2025

Here’s an SQL query I wrote recently.

SELECT DISTINCT pu.id
FROM permit_urls pu
LEFT JOIN permit_details pd ON pu.id = pd.permit_url_id
WHERE pd.permit_type = $1 AND (pd.permit_url_id IS NULL OR pu.decided = 'f')
LIMIT $2;

permit_urls holds URLs that need to be scraped. permit_details holds data of the scraped URL. If a row is added to permit_urls and a relevant permit_details row doesn’t exist, then scraping will occur.

This query worked fine for years until I updated it a few months ago. After that it didn’t break but was consistently returning zero results until I realised my mistake. Can you see the bug?

Here’s where it lived: WHERE pd.permit_type = $1 AND (pd.permit_url_id IS NULL OR pu.decided = 'f').

While that WHERE clause is syntatically valid, the brackets around the latter part was causing the query to short-circuit on the pd.permit_type = $1 check. This is not incorrect in and of itself, but if we think a bit longer about it, pd.permit_type = $1 was always going to resolve as FALSE because the row in the permit_details pd table doesn’t exist yet… so we could never return TRUE from that part of the clause.

The adjustment happened because I wanted to split up the processing by the permit type, and I ended up completely breaking the pipeline in the process.

The obvious solution would be to write tests, but given its a side-project I didn’t see the value in it… until after this bug :)


Written by Simon who lives in Malta. You can find out more about me on the about page, or get in contact.