Simon's Blog

Numeric Scripts

December 14, 2022

This blog post is not published yet! (This will disappear once it is published)

I’m often starting new small projects, especially ones that involve web scraping. This habit of gathering one off odd datasets has come in handy a few times when I want to back an opinion up by fact.

Since I only check in on these projects rarely, I typically forget the end to end flow of how to run each step (such as scrape -> parse -> scrape -> parse -> clean up data), which I tend to split into separate scripts.

I’ve started using a “numeric script” approach. scrape_links.py, parselinksfor_urls.py, check_links.py, and extract_links.py on their own have an ambiguous flow, while the following makes it much easier for me to remember:

  1. 01scrapelinks.py
  2. 02checklinks.py
  3. 03extractlinks.py
  4. 04parselinksforurls.py

I could of course write a quick README stating the order, but this approach is ✨ self-documenting ✨.