Test Driven Devops

Illustration: 500 - Internal server errorI like to apply Test Driven Development to my sysadmin work. For example, every time I add a new redirect to a web server configuration I want to make sure I haven’t broken anything else. Further, I want my SSL configurations proactively checked daily for any possible error. I use Ruby RSpec and write tests like these:

describe 'My app' do
  context 'www.myapp.com' do
    it { should be_up }
    it { should have_a_valid_cert }
  end

  it 'serves the "about" page without redirecting' do
    expect('http://www.myapp.com/about').to be_status 200
  end

  it 'only serves via www' do
    expect('http://myapp.com').to redirect_permanently_to 'http://www.myapp.com/'
  end

  it 'forces visitors to use https' do
    expect('myapp.com').to enforce_https_everywhere
  end
end

When I want to make a configuration change, I first write a test for the desired outcome. Naturally, it fails while the old tests pass. I then work on the config change, re-running all the tests as I go, and am finished when they all pass. I also run these automatically from a cron job to get pro-active notification of new problems.

The phrases such as have_a_valid_cert are custom RSpec matchers; they’re added into the RSpec environment by an open source library I’m writing and just made available on Github.  I’ve begun to refactor all the custom matchers out of my internal weblaws.org code into a new rspec-webservice_matchers gem which makes this easy to install.

Update: I’m making an online service based on this code, and am looking for beta testers.

See also

Linode vs. DigitalOcean: back to Linode for me

DigitalOcean doesn’t give their first-line support people the necessary tools to diagnose their own system problems. And so a two-day outage becomes possible.

The night before Christmas, I was scrambling to get my site back up

'top' showing load of 8 and huge wait time

‘top’ showing load of 8 and huge wait time

On the afternoon of the 24th, My New Relic monitor told me that weblaws.org was offline. Ugh. I ssh’d in to the 4GB RAM DigitalOcean Droplet and saw that the load was up to 8 (FYI, this is crazy high), although no processes were active, and it had a huge unexplainable “wait” time.

Continue reading

It is supremely important that we ensure our data is safe. . .

. . . it is supremely important that we ensure our data is safe, consistent and reliable. We can dramatically increase these factors by taking full advantage of the tools at hand.

Yes. This is the most critical, important task in software development. A great set of posts, Coding Rails with Data Integrity by Jay Hayes. Part 1, part 2, and part 3.

Meteor Won’t Kill Rails Anytime Soon

This is in response to Why Meteor will kill Ruby on Rails. A pretty big claim, for sure. Now, Meteor is worth checking out, but Josh didn’t name any of the reasons why I and a whole lot of others choose Rails. So here we go.

It was its built-in support for these got me interested:

…and then its convention-over-configuration kept me hooked.

It’s like Rails has taken the best practices we’ve learned about building good software, and made them the default. And Django is on the same path. Josh credits Rails’ popularity on convention-over-configuration. That’s definitely helped. Meteor looks very interesting, but until it supports more of the above, we’re talking apples and oranges.