Failures of Imagination | Fuzzy Tolerance

As a programmer, you spend a lot of time thinking about how your users will interact with your system in unexpected ways.

After the Apollo 1 disaster that claimed the lives of three astronauts during a test of the communications system, a senate committee asked astronaut Frank Borman what had caused the fire. His response was simple: “A failure of imagination.“ They had imagined and prepared for fires in space. They had never imagined a fire in the spacecraft while it was sitting on the ground. I’ve found this to be a good way to think of unexpected user interactions with a system - they are often failures of imagination by the programmer.

Take a common need for GIS applications - locating an address based on user input. You would be startled by how many times users get this wrong. An extremely informal study* with Geospatial Portal found users botched it ~6-7% of the time. That’s more than 1 in 20, which is a high failure rate.

A common way to handle this type of problem is with auto-suggest/auto-complete, or taking a guess at what the user wants to type before they get done typing it. I tried a couple of approaches to lower the error rate in this manner. The first was to limit the user to two input pieces - house number and street name, with the street name auto-suggesting. The user would then get a result list for those two pieces of information. This lowered the error rate slightly, but it required a lot more user interaction - typing in two text boxes and clicking a button, picking from a result list for, say, “101 E Main St” and “101 W. Main St.”, etc. And people would try to stuff more than the street name in the street name box. This approach wasn’t working out.

Next I made a single text box input that auto-suggested based on the entire address. This worked extremely well - error rates went below 1%, and it required very little user interaction. The only problem I ran across was one of speed.

Basically the SQL involved in this type of query looks like this:

select fields from address_table where address like 'user_input%'

A query like this is essentially assault & battery on your database. With this type of like operation, your database is effectively having to scan the entire table. I tried different types of indexes and techniques, but this is just a slow query. For our +400k record master address table, this query was averaging nearly two seconds (~1800ms), which is too long for this kind of functionality. I’d have people keep typing rather than wait, increasing the likelihood of input errors.

To get around this problem, I had to do a little string manipulation. The first element in any address is the house number, followed by a space. I auto-suggest after the first six keystrokes, which will capture the entire house number. Hence I had the service find the house number with something like this:

if (strpos( $address, " " )) $houseno = trim(substr($address, 0, strpos($address, " ")));

And then I have a SQL query that looks more like this:

select fields from address_table where house_number = $houseno
and address like 'user_input%'

This narrows down the like search to a very tiny subset of data. This moved the query from ~1800ms to ~15ms, or a +100x performance improvement. The auto-suggest comes up nearly instantly, with the result being negligible user error rates and a better overall user experience.

It takes imagination and creativity to write software, but it also takes imagination and creativity to see the software you’ve been laboring over from a user’s perspective. As programmers we often find the former much easier than the latter. It’s easy to dismiss user problems as PEBCAK (problems exists between chair and keyboard), and sometimes they are. But whether the botch rate is testable as in this case or not, it pays to avoid a knee-jerk reaction and try to imagine things from the user’s perspective. In the end it’ll make your customers happy, and over the long term it will make you a better programmer.

*Basically I had a tweaked geocoding service log the input for 0 record results vs the number of total requests for a week. Factoring out things like an error in our master address database and a fascinating profanity-laced diatribe in lieu of an address, I had 6-7% botch rate (this is totally memory recall). The #1 thing people seemed to botch is the street type, with a close second being street name misspellings. House number and prefix/suffix errors were pretty low.