The JOIN that goes nowhere
April 02, 2025
Here’s an SQL query I wrote recently.
SELECT DISTINCT pu.id
FROM permit_urls pu
LEFT JOIN permit_details pd ON pu.id = pd.permit_url_id
WHERE pd.permit_type = $1 AND (pd.permit_url_id IS NULL OR pu.decided = 'f')
LIMIT $2;
permit_urls
holds URLs that need to be scraped. permit_details
holds data of the scraped URL. If a row is added to permit_urls
and a relevant permit_details
row doesn’t exist, then scraping will occur.
This query worked fine for years until I updated it a few months ago. After that it didn’t break but was consistently returning zero results until I realised my mistake. Can you see the bug?
Here’s where it lived: WHERE pd.permit_type = $1 AND (pd.permit_url_id IS NULL OR pu.decided = 'f')
.
While that WHERE
clause is syntatically valid, the brackets around the latter part was causing the query to short-circuit on the pd.permit_type = $1
check. This is not incorrect in and of itself, but if we think a bit longer about it, pd.permit_type = $1
was always going to resolve as FALSE
because the row in the permit_details pd
table doesn’t exist yet… so we could never return TRUE
from that part of the clause.
The adjustment happened because I wanted to split up the processing by the permit type, and I ended up completely breaking the pipeline in the process.
The obvious solution would be to write tests, but given its a side-project I didn’t see the value in it… until after this bug :)