What I got up to on Thursday

16 November 2008

Last Thursday and Friday, I was very lucky to be invited to the Guardian’s first internal hack day. Whilst it was primarily an internal event, they also invited along a few of their friends to see what we could do with some of their information.

It was a really stimulating two days – exciting to see just what the Guardian is doing with their data and their journalism, and the ways they’re trying to make it more open. A particular highlight was seeing Simon Rogers explain the process of researching infographics and data-sourced news articles, and offering his talent for hunting down data to anyone who needed it; he provided a lot of hackers with useful sets of information that were only ever going to be found through a series of tactical phonecalls. For those of us not requesting data to order, the Guardian’s new full-text RSS feeds came in very, very handy, let me tell you.

It was also great to meet some of their technical staff. Obviously, the Guardian developer programme is in safe hands with Matt McAllister, and I’ve known Simon for a while, but it was great to meet lots more of their developers, client-side team and QAs; they were, to a person, lovely and talented, and it’s clear that the Guardian has a deep culture of quality.

I orginally wanted to build something along the lines of CelebDAQ but for journalists. The idea would be that you invested in journalists and made returns based on the column inches they filed; the goal was to highlight a lot of the high-volume content on the Guardian website that goes unnoticed, whilst making the more prolific and “celebrity” writers like Charlie Brooker expensive commodities.

Unfortunately, it soon become clear that the volume of scraping and data-parsing I would have to undertake would take far longer than I planned, and I wasn’t planning on staying up all night.

So I scaled down my thinking, and instead of undertaking “real programming” I started thinking instead about “neat hacks”, and the result was this:

In a nutshell, it parses the Guardian’s publicly available politics RSS feed, counts the number of names of Labour MPs and of Conservative MPs (not to mention the words “Labour”, “Tory”, and “Conservative”), and then works out the “swing” of the page. That data is then sent over serial to an Arduino, which outputs the result on a little bargraph.

It wasn’t the hardest of challenges, but I did get to write some Wiring and learn how to send serial data from Ruby, and I had a lot of fun poking electronic circuits. I was fortunate enough to win a subscription to Make for my troubles, as were the other team of plucky hardware hackers in the room – a lovely surprise to end the two days on.

37 hacks were submitted overall – impressive given the short period of time and how busy everybody was – and they ranged from the entertaining to the remarkably useful, from the thought-provoking to the empowering. Jemima Kiss has written up a few of the stand-out hacks in her Guardian blogpost on the event. It was great to see what such a talented – and multi-skilled – room could produce in under 24 hours, and I hope that the internal team at the Guardian enjoyed it as much as I did.

Many thanks to everyone who organised the event, and I look forward to seeing what the Guardian do with their data – and their great hacking – on a larger scale.

I must have lost about six or seven hours trying to get a Rails application deploying from Git in the past week. I could push and pull from the repository, but could I get the thing to deploy via Capistrano? No, I could not.

The problem, as far as I could tell, was not with Capistrano. It was a simple SSH problem. I block port 22 for SSH on the server in question, for security reasons, and use a different port. But, no matter how I specified it, Git was insistent on trying to pull over 22. I did a lot of Googling, and found lots of conflicting answers, none of which worked.

And then I learned my lesson. That lesson is: when Linus tells you what to do, you do it:

Use the “.ssh/config” file ;)

So I configured a hostname in .ssh/config on the server, and everything worked instantly.

A lot of problems tend to come down to SSH, it seems. After that point, everything went swimmingly.

Making things is fun. It’s satisfying to watch things spring forth from nothing, made by your own hands.

The greater the gap between your own capabilities (or your perception of them) and your output, the more satisfying – not to mention bemusing – the process is. That’s why Moo‘s public API is so exciting – using nothing but code, you can create real, physical, things. Imagine that! Objects you can hold that sprung forth out of bits and bytes.

I mentioned last week that I’d worked on extending Ruminant, a Ruby library for interfacing with Moo’s API, to also handle the creation of stickers. I wasn’t doing so purely out of generosity, though; I had a project up my sleeve that I wanted to work on.

I can now show you the results of that project. Why only now? Because now, I have the physical products in my hands. I think it’s really important with something like the Moo API that you only talk about what you’ve made when it’s actually real – no showing off code and saying “oh, they’ll be here soon“. You’ve got to make the things.

Anyhow: now I can tell you what I was up to.

moostickrs.jpg

These are two books of stickers, made from of my most recent photos on Flickr. They’re built by taking data from the Flickr API, processing it on my computer, uploading it to the web, and sending it to Moo’s API. This is a single shell command. You fill out a configuration file with the important details – such as your API keys for both Moo and Flickr – and run the file. A short while later, you’ll be asked to pay for your stickers, and off you go.

The fun part of this isn’t the whole one-step thing; it’s what goes on when we process the images. We don’t just print them straight, you see.

Another short aside: making real things out of code is fun because you don’t think it should be possible, and image-processing is actually similarly entertaining, just because it feels like it should be harder than it is. Most “easy” programming comes down to processing text in, and text out. Images seem like they should be harder. In fact, images are now much easier than they used to be thanks to things like GD and ImageMagick. I had a lot of fun playing with RMagick, and it wasn’t difficult at all.

So, what did I make?

dadist-moostickrs.jpg

The first are what I called Dadaist Photographs. Moo stickers are small; it’s quite hard to see a proper photo on their small dimensions. So why not make something at once very vague, and yet also entirely precise? That’s what these are. The background of the image is the average colour of the photo, determined by summing the red, green, and blue values of each pixel in the image, and then dividing each of those by the area of the image to get the average red, green, and blue values – and then making a colour out of those. In the foreground, we super-impose the title of the photograph in text. This is, as you can tell, somewhat silly. But! It’s a hyper-realistic single-pixel photograph, and ideal for Moo’s stickers. (A quick note – I didn’t quite add enough padding to the text on these. I’ve learned my lesson for next time).

Whilst I was working on that, I had another fun idea. It turned out to be just as easy to build, as it resuses most of the same code as the Dadaist Photographs. This let me abstract lots of things out, and at the same time learn how to write slightly tidier object-oriented Ruby. Anyhow, a short while later, and we had these:

stripy-moostickrs.jpg

These are less silly, and to my mind more beautiful – they render wonderfully on paper. They are very simple to make. First, we squash the photo down to being a 500×500 square. Then, we take the middle row of pixels in the image, and replace every row of pixels in the image with the middle row. The net result is essentially a “stretched” image, based on a single row from the image. RMagick made this very easy. Like I said, I think the results are very beautiful, and it’s amazing how easily identifiable they all are.

I wrote these by first creating the image processing code. That’s the stuff I was least familiar with, and took the longest to get my head around. Once that was done, it was relatively easy to bolt proper Flickr API import on (thanks to the Net::Flickr gem), and subsequently take my processed images and throw them directly at Moo’s API, thanks to Ruminant. A small amount of tidying, abstraction, and the creation of simple config files later, and we were done.

The only slight catch is that Moo need to get pictures from the public web. I’m running my script locally, because it’s quite processor/memory intensive, so the script SFTPs the pictures to a destination of your choosing before sending them to Moo.

But that’s it. It’s one click. It works most of the time (but with 90 images sometimes chokes a little; still, it’s not hard to salvage that by generating the XML for the order yourself). Because of the processor/memory overhead on rendering the images, I haven’t put this online as a web tool – I’m still thinking if there’s an easy way to do that. This could end up on EC2 one day.

What I’ve done instead is to put it on github, so you can at least see the code to learn from it, and, if you want, download and run your own copy. (If you’re not sure what to do: install git, and then click “clone” on the github page to get the command to type to clone the repository), I can’t guarantee it’ll work on your machine, and I can’t offer any support to help you get it running, but I hope you have fun with it regardless.

So there you go. First, an idea; then, the physical product; finally, the code that makes it all work. This doesn’t serve much real purpose, I’ll admit, but it was a fun making project, and it’s hugely satisfying to see how easy it is to make things out of pictures and paper with code, starting with a simple idea.

I’m not sure if I’ll take this any further – it stands alone quite nicely – but for starters, I’m going to see if I can extend Ruminant to handle other product types – though for various reasons, that might take a little while. If I do anything with it, you’ll hear it here first. In the meantime, I hope this serves as a little inspiration for how easy it is to make fun stuff with Moo, and perhaps that my silly, surreal stickers raised a smile.

Last night, I took a look at James Darling‘s Ruminant library for Ruby. It’s a little Ruby library that lets you assembled designs and orders and send them to the Moo API for printing. It’s really nicely designed, but it’s only in the very early stages of development; it only supported creation of Minicards.

For various reasons, I’m looking at creating stickers through the API, and decided that it only seemed right to add sticker support to Ruminant.

As of last night, I’ve done exactly that. This is in part down to the joy that is GitHub. I forked James’ original code, and started work on my own Ruminant fork. I’ve added support for stickers, and have issued a pull request so that hopefully it’ll get merged back into James’ branch.

To install it, you’ll need Hpricot installed (sudo gem install hpricot). Once you’ve done that, you can install it as a gem directly from my Github code. First, add Github to the list of sources rubygems supports:

gem sources -a http://gems.github.com

and then install my gem:

sudo gem install infovore-ruminant

and follow the instructions in the README.

More to come, along these lines…

The COI have recently published a draft of their browser guidelines for anyone developing a public sector website.

Frankly, I’m very unimpressed; they’re dangerous, vague in certain areas and over-specific in others, and promote some terrible ideas. I’d urge any designer or developer with an interest in this area to download the document and read it – it’s really not very long – and then leave feedback on the document with the COI through the appropriate form. The public have been asked for responses, and we have the scope to respond; if you feel as strongly about what’s in the document as I do, I hope that you will.

My own response follows.

As a web development professional, I’m very unimpressed with this consultation document, and would go as far as to suggest that some elements of it are actively harmful.

I appreciate the attempt to codify the need for effective browser-compatibility for all public sector websites. That said, the manner in which the document suggests which browsers to test is very poor.

All browsers are different, even though the HTML and XHTML specs remain the same. The purpose of browser-testing is not to ensure that sites look identical in all browsers; the purpose is to ensure that the site is usable in all browsers.

As such, lists of browsers to be tested in are dangerous; the best we can aspire to is to write good, valid code that is functional in all browsers, and priortise appearance for the most modern browsers.

I take exception to the notion that the browsers to be tested on are those which have >2% share of visits on a website. 2% of hits on a very popular public-sector website might account for a sizeable proportion of users, and to exclude them (especially if the lack of testing in their browsers leads to impared functionality) could well contravene the Disabilities Discrimination Act. Also, note that these statistics are not necessarily accurate, and may contain spoofed or inaccurate user data.

Going beyond the 2% hurdle, it would not be feasible to test in all browsers. The best solution is probably a form of graded browser support, much as Yahoo recommends, which itself is reviewed and updated over time, and which guarantees a minimum functionality in certain families of browsers, full support in others, and makes it clear which browsers simply are not supported. Browsers do not cost money; they are not complex tools to install, and developers should not be limited simply because a certain percentage of users on their website continue to user Internet Explorer 5.

Browser support should not be a series of boxes to check off, and it should not be specified retroactively based upon current users; it should be based upon accurate specifications and usage patterns, to ensure that public sector websites – many of which already have high production costs – are sustainable, maintainable, and functional for years to come.

As such, this preliminary draft has a long way to go before it accurately represents the state of web usage, not to mention web development, in 2008.

One thing I usually forget to do when I backup a computer is back up my MySQL databases. Partly, because they’re not stored in my Library (I don’t think); partly because I forget how many I have. mysqldump only backs up one database at a time, unfortunately. What would be great is something that dumps all of the databases in the system.

Anyhow, whilst on hold to my ISP this morning, I decided to solve this problem once and for all.

The end result is a pair of Ruby scripts which you can get from github.

The first will iterate over every db on your system (when run with an appropriate username and password) and spit out a .sql file with a filename corresponding to that database. The second look at a folder of .sql files named similarly, and for each one, drop a databases with that name, re-create it, and restore from the .sql file.

I’m sure I could do it just fine in a bash script, but it made sense to use the tool that comes most quickly to my hands, and that means Ruby. Once you’ve got Ruby installed, the rest is easy. Clone them, patch them, fix them; they’re basic, as maintenance goes, but handy.

Get the scripts from github.

A while back I mentioned that the iPhone App Store was a place where we could see people paying for interface alone, regardless of functionality.

This is a useful segue into Daniel Jalkut’s commentary on the nature of independent software development, and, specifically, whether small-software should be free-as-in-beer software. Jalkut makes the point, as an independent developer, that you should support the software you like, regardless of how slight it is. The example he refers to is Pukka, a nice little tool for posting to delicious from OSX. Pukka is nice because it’s always available and it’s very Mac-like in its behaviour. It’s pretty cheap at $12.95.

Jalkut takes exception to Leo Laporte’s commentary in a MacBreak Weekly podcast, where he suggests (as he tells us how much he loves Pukka) that it should be free.

Why did he suggest this? The answer, simply, is that Pukka is an interface to someone else’s functionality rather than a tool in its own right.

To wit: Pukka interfaces to delicious through the delicious API. Most of the hard work of social bookmarking has already been implemented by the delicious time. All Pukka does is talk to the API – it’s a menubar item, an interface, and a window that sends data to the API. Not a product on its own. Of course, if you know anything about development, you’ll know that building things that talk to APIs – on the desktop, on the web, wherever – isn’t always as easy as it sounds. $14.95 seems reasonably to pay for an app that does this well, especially if you use delicious as much as (eg) I do.

Jalkut’s own MarsEdit (which I’m using a licensed copy of to write this) is similar. It’s a $29.95 weblog editor, that interfaces with most popular blogs, and lets me write posts on my Mac desktop. It’s not that I couldn’t write blogposts before; I can always log into WordPress to do that. No, the reason I bought this is because of the convenience and quality. I rather like posting from this fluid, offline interface, rather than having to type into a box in Safari, for various reasons – the quality and speed of preview, the simplicity of media integration, and the multi-blog (and API) support – I use MarsEdit to post to both WordPress and LiveJournal. If I couldn’t spare $30, I could always just blog from the existing admin screens, but I felt the product was so good I should be it.

Sometimes, it’s hard to express to people the value of a product that does something you could already do. A product that does something new, or which is an essential tool, is much easier to justify. Many Mac owners I know didn’t hesitate to pay the €39 for TextMate, because text editing is so fundamental to our work. But $30 on a blogposting client? That one requires more thought, and isn’t such a no-brainer.

I’m not sure what the solution is. It’s a shame that it’s harder to express the value of “service” applications; I think the iPhone might have it better off here, simply because the device itself is so unlike traditional clients that it makes sense to redesign interfaces to services for it. In the meantime, it’s worth remembering that a quality interface to an existing product might still be worth something, however small, and it’s for that reason that developers like Jalkut should be rewarded for their work.

A pet peeve of mine is the lack of a documented shortcut in Ruby’s #strftime to function to return the hour of the day, in twelve-hour clock, without a leading zero. To wit:

puts Time.now.strftime("%I:%M") # >> 03:29

That’s not particularly attractive. I could strip the leading zero with some string manipulation, but this is getting sledgehammer-ish to crack a nut. Fortunately, this works:

puts Time.now.strftime("%l:%M") # >> 3:29

That’s a lowercase L in the formatting string, which returns the number of hours in a twelve-hour clock sans leading zero. Result! And yes, that’s undocumented everywhere I’ve looked. Thanks to my colleague Colin for pointing that trick out.

Now, if only I could get it to return am/pm without having to call #downcase

Connecting Rails to legacy databases isn’t actually that hard – depending on the database you start out with. Recently, we needed to perform some statistical analysis on a large Movable Type database. Rather than wrestling with endless SQL queries at the prompt, it made sense to abstract out a little and build some kind of modelled front end to the statistics.

The most obvious tool for me to use was Rails; I’m familiar with it, I like the way ActiveRecord works, and it means that I can poke around the database from script/console if I need to.

The reason this turned out not to be too hard is because whilst Movable Type doesn’t conform to Rails’ opinionated ideas of what a schema should look like, it is still a well-designed and normalised database. Because of this, we can teach ActiveRecord to understand the database.

First of all, we start by creating our models: for our needs, Blog, Comment, Post and Author. We generate them in the usual manner – script/generate model blog. Once we’ve done that, we delete the migration files in db/migrate it creates, because we’re not going to use them.

Next, we point config/database.yml to the Movable Type database.

And then, we build our relationships thus:

class Blog < ActiveRecord::Base
  set_table_name "mt_blog"
  set_primary_key "blog_id"

  has_many :entries, :foreign_key => "entry_blog_id", :order => "entry_created_on"
end

class Entry < ActiveRecord::Base
  set_table_name "mt_entry"
  set_primary_key "entry_id"

  has_many :comments, :foreign_key => "comment_entry_id"
  belongs_to :blog, :foreign_key => "entry_blog_id"
  belongs_to :author, :foreign_key => "entry_author_id"
end

class Comment < ActiveRecord::Base
  set_table_name "mt_comment"
  set_primary_key "comment_id"

  belongs_to :entry, :foreign_key => "comment_entry_id"
end


class Author < ActiveRecord::Base
  set_table_name "mt_author"
  set_primary_key "author_id"

  has_many :entries, :foreign_key => "entry_author_id"
end

The set_table_name method tells the ActiveRecord class what table to look at, and the set_primary_key method does exactly what it says on the tin. (It also makes sure that #to_param works correctly based on whatever our new primary key is, which is handy). Beyond that, we simply have to specify the foreign keys on our relationships and everything plays ball; we can now access blog.entries just as we do with a typical Rails setup. It’s now easy to write the rest of our Rails app – model methods, controllers, views – just as we normally would. We’re just using the MT database to pull out our data.

And if you’re wondering: yes, it made the manipulation a lot easier, and a few hours poking at the console began to yield some interesting algorithms to apply.

Velocity bundle for TextMate

28 September 2007

Well over a year ago, I mentioned that I was working on a Velocity bundle for Textmate. Or, to be more precise: I mentioned that I’d already written one that we were using at NPG.

A year later, I’m ready to release the bundle; you can get it from its Google Code site. But before you go there, an explanation for the delay is in order – and on the way, I’ll tell you about how the bundle was written.

Continue reading this post…