String Parsing for Easier Searches

Through a lot of testing (and swearing), I’ve found the #1 thing users botch on a web application is text entry. Nothing else comes close. That seemingly innocent text box with the blinking cursor is the primary place your user-programmer dance will end with broken toes and hurt feelings.

One of the things I do to reduce text entry problems is to autocomplete text fields wherever I can. The user can start typing in a street name or an address and an autocomplete list will show up. With the user only having to get the first few characters of a search string right, my number of didn’t find nothin’ search results goes way down, and the user-programmer dance gets a little better.

The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit.

But this still left me with a complexity problem. The search area in one of my apps had a text box for addresses, a text box for places, a text box for a parcel ID, a text box for a street name, two text boxes for intersections, and two drop down lists for different types of government facilities. That’s a whopping 8 form entry fields to perform all of the various searches.

I started thinking about condensing this mess into a single search box. I needed to keep my autocomplete functionality to reduce user headaches, but autocomplete functions have to be sub-second fast to be useful. Otherwise the user outruns them when typing and they don’t do anybody any good. And I couldn’t very well search on everything every time and keep the database calls fast.

Time for some string parsing goodness.

Check these search string snippets out:

  • 101 Main
  • Abbey Park
  • Ruth
  • Ruth & Dolphin
  • 12312312
Here the user is trying to search for an address, a place, a street name, an intersection, and a parcel ID. As a programmer, what I see is:
  • Address: <Integer><space><string>
  • Place Name: <string>
  • Street Name: <string>
  • Intersection: <string><& character><string>
  • Parcel ID: len(<string>) > 8 and isInt(<string>)
In other words, if the string is composed of an integer followed by a string, we can assume it's an address. If it's a string with no leading integer, we can assume it's a place or street name. If it's a string followed by a &, we know it's an intersection. If it's an 8 character string that can be converted to an integer we can assume it's a parcel ID. So I can parse the search string to narrow down the database query, allowing for fast and targeted autocompletes.

Let’s take a look at how that might look in PHP. We’re looking at the string processing and logic here - the nitty gritty processing code will be specific to your data. First, we’ll get the user input.

1
$query = preg_replace('/\s\s+/', ' ', trim($_REQUEST['query']));

The regex is just replacing extra spaces in the search string. The trim gets rid of leading or trailing white space. No more regex, I promise.

Now we just need some string testing to see what we’ve got.

1
2
3
4
5
6
7
8
if (is_numeric($query)) {
if (strlen($query) == 8 ) {
// Process the Parcel ID
}
else {
// Return nothing
}
}

Here we check to see all we have is a number. If that’s the case, we assume it’s a parcel ID. If it’s 8 characters long, we know it’s a parcel ID and we can process that. Otherwise we ignore it.

If it isn’t a PID, we start looking for everything else. So this will be in an else statement to the original if.

1
2
3
else {
$query_array = explode(' ', $query);
$pos = strpos($query, "&amp;amp;amp;amp;");

Here we’re getting an array of elements from the query string. We’re also checking to see if there’s a & character, which tells us to look for an intersection.

1
2
3
if (is_numeric($query_array[0]) ) {
// find the address
}

If the first string passed is an integer, we’re assuming it’s an address. Remember we’ve already weeded out strings that are nothing but a single integer as parcel ID’s.

1
2
3
else if ($pos != false) {
// Find possible intersections
}

If it wasn’t an address or a parcel ID and it has a & character in it (the strpos function will return false if the search string isn’t found) we’ll treat it as an intersection, like “Ruth & Something”.

1
2
3
4
else {
// get points of interest
}
}

Finally, if it isn’t a parcel ID or an address or an intersection, we’ll assume it’s a point of interest (park, library, etc.), process it as such and close the else loop. We can now condense our 8 form entry monstrosity into a single search box with full autocomplete functionality, with a little help from jQuery on the client side.

Grabs address from .5 million record table in ~18ms. Thank you Postgres.
Some points of interest. Note ~* regex searching is happening.
A little intersection action.
A little help is always appreciated.

Viola - the ubersearch. The one search of Sauron. Or, you know, how the Google does it. Put your search box on top of your page and highlight it so the user’s eyes grab on to it.

There’s only one tricky bit to doing string parsing and categorizing like this: you have to keep an eye on the data. Search fields devoid of form can bite you. What if you have a Sanford & Son point of interest? The & character would make our autocomplete think it’s an intersection. What if you had a point of interest called 101 Main? Our autocomplete logic would have that be an address. So you have to watch your data. But if you can pull it off, your users will thank you for it a thousand times over.

To see an example of this, check out GeoPortal.