Charles Arthur recently wrote that if [he] had one piece of advice to a journalist starting out now, it would be: learn to code.

I understand the point he’s making, but I think there’s a further degree of subtlety to the argument. After all, learning to code is hard. Learning to glue together bits of scripts, and later bash your way into scripting langauges really is useful, but even that isn’t easy. It requires you to learn to translate intent into code, to know what’s possible, to know what’s easy and what’s hard, and to know what to do when third-party things you’re glueing together don’t work.

In short: it’s really easy to make a mess, and a mess that was difficult and stressful at that.

So my advice would be somewhat different, and apply to both those journalists who find code easy, and those who find it impossible:

Learn to think like a programmer.

What’s really important is to not understand how to do magical things with code, but to learn what magical things are possible, what the necessary inputs for that magic are, and who to ask to do it.

Identify the repetitive tasks that computers are good at. Yes, they’re good at find-and-replace, but tools like regular expressions are even handier, and I’m amazed how few people understand that find-and-replace is the beginning, not the end, of text processing. (And yes, I’m aware that regex are a quick way to give yourself two problems.)

Computers are really good at processing regular data, and they are really, really good at repetitive tasks. Every time I watched someone in an office doing a repetitive, regular task I despaired, because that’s exactly the kind of thing we have computers for.

You shouldn’t try to build the program that magically automates everything. But you should learn to smell the tasks where computers could help; learn to sniff out the angles on a story that a computer would be a useful tool for.

So that means when you find a table, or a regular data source, you don’t just take a print-out; ask for an Excel file, to convert to CSV, or maybe even a database dump. Even if you can’t do something with it, somebody else can. So the important thing to remember is what a progammer might want to receive.

When you’re gathering data, regularity is important. If you’re using Excel, keep it really simple, and one-column-per-thing, so that later a programmer can do something with the CSV. If you’re gathering textual information, put it in a plain text file, rather than Word; it’ll save you time in the long run.

Also: there are lots of useful tools that are halfway between being a programmer and not, and these are the most interesting spaces for the journalist right now. Simon linked to a bunch of these at the Guardian Hack Day, and it amazed me how many great tools there are for the non-programmer to do programmer-like tasks.

Excel, for starters, is a great environment (if a little limited and esoteric) for starting to explore datasets in a relatively visual way – structured data formats aren’t as immediate to more visual thinkers. Obvious examples include the frankly remarkable DabbleDB and, even though it’s never as useful as I hope it might be, Yahoo Pipes.

These let you exercise programmer-like thinking without needing to be a programmer. And then, when you’ve discovered what it is you want to do, even with the vaguest of prototypes, handing all your information and ideas over to a coder is much easier.

Why? Because you’ve already been thinking like a programmer. You’re handing them thoughts and data in the format they like.

So how do you learn this?

Partly, you have to try a bit of code yourself, but I’d make sure you’re always on the right side of the “understanding what I’m doing” vs “doing neat stuff” seesaw; understanding should be your goal.

Partly, it’s getting handy with a shell. One of the best places to explore what you can do with data is the command line; as well as the true scripting languages, there are tools like grep, sed and awk which can be remarkably powerful. Not entirely user-friendly, I’ll give you, but easier than breaking out a full program.

And partly, it’s relaxing a little and stepping away from the Office suite. Putting your data in formats like CSV, XML, JSON, and plain text doesn’t just make the data more useful to coders; it’ll be more useful for you, when you want to move it around.

I remain convinced there’s an interesting book on “doing smart stuff with computers that isn’t quite programming but isn’t far off”, because let’s face it, most people deal with data all the time now, and have the ideal tool for working with it on their desks. Now they just need to work with it a little.

So whilst this isn’t quite the “learning to code” that Charles speaks of, it’s not far off. And indeed, I think he hits the nail on the head much better in his conclusion:

…nowadays, computers are a sort of primary source too. You’ve got to learn to interrogate them effectively – and quote them meaningfully – too.

That feels about right. You don’t need to be a coder, but you need to be able to interrogate computers meaningfully. Do that how you will.

(As for me? Well, I wanted to be a journalist, but fate didn’t turn that way (although I’ve worked in the media and had a small amount of writing published). I did, however, seem to take to the coding malarkey a little better. I still maintain I’m not really a programmer, and certainly not in the sense that my real-programmer friends are, but evidence sometimes disproves that).

As a little post-Christmas present, I thought I’d share a little code toy I’ve been working on recently.

You might know that I’m a fan of Twitter as a messaging bus, and I’ve already built some entertainingly daft bots in my time with it. Recently, I decided to flex my programming skills a bit and build not one but four bots. And, more specifically: four bots that talk to each other.

Enter @louis_l4d, @zoey_l4d, @bill_l4d and @francis_l4d. You might notice that they’re named for the characters in Left 4 Dead.

This is not a coincidence.

One of the most wonderful things in that game (which I’ve already commented on the brilliance of) is the banter between the four player characters. There’s so much dense, specific scripting, and enough dialogue so that it rarely repeats. I thought it would be interesting to see if you could simulate the four players’ dialogue over Twitter, sharing some state between the bots, but also finding a way to make them communicate a little with each other.

Well, a bit later, I worked out how, and this is the result:

twit4dead.jpg

You get the picture. They run a scenario, they bump into boss zombies, they find stuff, they get hurt (and help each other), they get scared (and reassure each other). At the moment, there are some dialogue overlaps; my main work at the moment is adding more unique dialogue for each bot. Bill is sounding pretty good, but the rest of them need work. It takes about 2-3 hours for them to run a scenario, and it’s usually fun to watch. (And, as you can see, it makes sense to follow all four of them).

So how does it work?

It turned out that rather than trying to build any real AI, it was much more fun just to simulate intelligence. The bots are state machines; they have a variety of states, which they transition in or out of dependent on factors, and suitable dialogue for each state.

I wrote the bots in Ruby. There are two main components to the twit4dead code: the Actor class and the Stage class. The Stage is a singleton; it’s where the state of the world is determined, and global variables tracked. It’s also where all the probabilities are run from. The Actor class is what each of the bots are, and it’s based on the Alter Ego state-machine library for Ruby. We have a lot of states, rules for transition, a selection of methods to handle being helped or talked to by friends, and a method to choose a random piece of dialogue appropriate to the current state.

All the bots are instantiated from a YAML file. For each bot, I store its Twitter username and password, and a nested tree of dialogue for each state. This means it’s really easy to add new states and maintain the dialogue for each bot. It’s also easy to add new bots – you just create a top-level entry in the YAML file.

Originally, I thought about the bots broadcasting and listening over Twitter, but the API calls were just going to get out of hand, and it turned out that Twitter didn’t like being bombarded with messages and would drop a few over time. So I separated out writing the script and broadcasting it. A small utility generates a script file; each line of this file consists of three delimited fields for username, password, and message to send to Twitter. Then, another program – which I currently run on a screened shell – reads that file and broadcasts one line of it every minute until it’s done.

And that’s it. I have to run it by hand for now, which is fine – it’s more something I fire up every now and then, rather than something you want to permanently run. I originally was going to keep track of loads of statistics – health, zombies in play, etc – but found a cruder set of rules worked much better. Every time they’re in combat, there’s a slim chance somebody gets hurt; every time somebody’s hurt, a friend will rush to save them. That sort of thing. Simulated Intelligence, then, rather than Artificial Intelligence.

Alter Ego turned out to be a lovely library; dead simple to use, and as a result the bots are really nicely modelled (or, at least, I’m very happy with how they’re modelled). The notion of a Stage with Actors, rather than a Game with Players, feels about right, and the modularity of it all is pretty nifty. It still requires a little refactoring, but the architecture is solid, and I’m proud of that.

I think my favourite aspect of it, though, is that at times, watching the bots play together is a little like magic. The first time I saw them talk to each other, cover each other whilst reloading, help each other up after a Boomer attacked, I felt a little (only a little, mind) like a proud father. They’re dumb as a sack of hammers, but they look convincing, and that was the real goal. It’s fun to watch them fight the horde amidst all my other friends on Twitter.

Nonsense, then, but a fun learning exercise about state machines, object orientation, and simulating conversations. State machines are a ton of fun and if you’ve not seriously played with them, I thoroughly recommend it.

Do follow the four of them, if you fancy; I’ll make sure I run them with reasonable regularity, and I’ll be fixing the dialogue over time. After all, I want to keep Francis happy.