Another point for Haskell
In Round 1 I described the task: find the number of “Titles” in an HTML file. I started with the Python implementation, and wrote this test:
assert title_count() == 59
Very simple: I had already downloaded the web page and so this function had to do just two things: (1) read in the file, and then (2) parse the html & count the Title sections.
Not so simple in Haskell
Anyone who’s done Haskell might be able to guess at the problem I was about to have: The very same test, written in Haskell, wouldn’t compile!
it "finds the correct number of titles" $
titleCount `shouldBe` 59
There was no simple way to write that function to do both, read the file and parse it, finally returning a simple integer. I realized that the blocker was Haskell’s famous separation between pure and impure code. Using I/O (reading a file) is an impure task, and so anything combined with it becomes impure. My function’s return value would be locked in an I/O wrapper of sorts.
I got frustrated and thought about dumping Haskell. “Just another example of how it’s too difficult for practical work,” I thought. But then I wondered how hard it would be to read in the file as a fixture in the test, and then call the function? I’d just need to pass the html as a parameter. And yep, this worked:
it "finds the correct number of titles" $ do
html <- readFile "nrs.html"
titleCount html `shouldBe` 59
As I refactored the code to pass this test, I realized that this is much better: Doing I/O and algorithmic work should always be separate. I had been a little sloppy or lazy in setting up my first task. The app with the Haskell-inspired change will be more reliable and easier to test, regardless which language it ends up being written in.
The point goes to Haskell
On the surface, it may sound silly to compare these two languages because they’re about opposite as you could get: Python is interpreted, dynamically typed, and slightly weakly typed as well. Haskell on the other hand, is compiled, statically and strongly typed. But they’re both open source, and they both have large collection of libraries which are helpful for my work.
I’m creating NevadaLaws.org, an app displaying the Nevada Revised Statutes, similar to my first, OregonLaws.org. Most of my code is Ruby and Rails, but I want to improve the architecture and maybe move to a new language. So I’ve been parallel developing the scraping & import code in my candidate languages, Python and Haskell.
The screenshots above show the results of running my first failing test in each language — failing for pretty much the same reason in each case. My first test is, parse the NRS index page and return the number of Titles found. Python’s pytest puts a lot of junk on the screen in comparison to Haskell’s hspec. Reading the hspec output is a thing of beauty, really.
Next: Round 2, Making Me a Better Programmer
Anyone who’s used Ruby has seen this message:
r.rb:1:in `name': wrong number of arguments (3 for 2) (ArgumentError)
This particular error has been driving me nuts for years. It’s just so unnecessarily difficult to interpret — especially if Ruby’s not the only language you use. I never remember which number is which.
Compare to Python:
TypeError: name() takes exactly 2 arguments (3 given)
This is better in several ways. The unambiguity; the tone (no “wrong”); the lack of line-noise characters; the plain-English sentence. Note also that the numbers are presented in the opposite order from Ruby. Not a problem in itself, but it makes the Ruby equivalent especially hard to interpret because it requires memorizing the difference.
Potentially as soon as Christmas, though, Ruby’s output look will look like this:
r.rb:1:in `name': wrong number of arguments (given 3, expected 2) (ArgumentError)
This is a great step in the right direction.
It turns out that the issue had been raised two years ago. I gave the topic a bump, and Martin Dürst quickly commited the improved code. I am incredibly appreciative, and happy to see that core Ruby developers value improvements like these. In fact, there’s an open Request for Comments on improvements to Ruby’s error messages.
Excellent article. We got to this situation because of the changing nature of writing software, and how we deploy. Our standards have risen, and packaging (and tooling in general) takes on ever greater importance.
But we don’t have to theorize and invent paradigms from scratch. A few other languages have solved this. Ruby is the best example. Let’s learn from its user experience and the use cases it serves.
The best, current guides I’ve found
It’s hard to find current best practices for organizing a reusable python package. I’m starting a new project to get the world’s restaurant health scores online, and so searching for some kind of guide, these are the best I found.
I couldn’t find a simple script for monitoring memcached’s efficiency, so I started this project.