String Parsing for Easier Searches

Through a lot of testing (and swearing), I’ve found the #1 thing users botch on a web application is text entry. Nothing else comes close. That seemingly innocent text box with the blinking cursor is the primary place your user-programmer dance will end with broken toes and hurt feelings.

One of the things I do to reduce text entry problems is to autocomplete text fields wherever I can. The user can start typing in a street name or an address and an autocomplete list will show up.  With the user only having to get the first few characters of a search string right, my number of didn’t find nothin’ search results goes way down, and the user-programmer dance gets a little better.

The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit.

But this still left me with a complexity problem. The search area in one of my apps had a text box for addresses, a text box for places, a text box for a parcel ID, a text box for a street name, two text boxes for intersections, and two drop down lists for different types of government facilities. That’s a whopping 8 form entry fields to perform all of the various searches.

I started thinking about condensing this mess into a single search box. I needed to keep my autocomplete functionality to reduce user headaches, but autocomplete functions have to be sub-second fast to be useful. Otherwise the user outruns them when typing and they don’t do anybody any good. And I couldn’t very well search on everything every time and keep the database calls fast.

Time for some string parsing goodness.

Check these search string snippets out:

  • 101 Main
  • Abbey Park
  • Ruth
  • Ruth & Dolphin
  • 12312312

Here the user is trying to search for an address, a place, a street name, an intersection, and a parcel ID. As a programmer, what I see is:

  • Address: <Integer><space><string>
  • Place Name: <string>
  • Street Name:  <string>
  • Intersection: <string><& character><string>
  • Parcel ID: len(<string>) > 8 and isInt(<string>)

In other words, if the string is composed of an integer followed by a string, we can assume it’s an address. If it’s a string with no leading integer, we can assume it’s a place or street name. If it’s a string followed by a &, we know it’s an intersection. If it’s an 8 character string that can be converted to an integer we can assume it’s a parcel ID. So I can parse the search string to narrow down the database query, allowing for fast and targeted autocompletes.

Let’s take a look at how that might look in PHP. We’re looking at the string processing and logic here – the nitty gritty processing code will be specific to your data. First, we’ll get the user input.

$query = preg_replace('/\s\s+/', ' ', trim($_REQUEST['query']));

The regex is just replacing extra spaces in the search string. The trim gets rid of leading or trailing white space. No more regex, I promise.

Now we just need some string testing to see what we’ve got.

if (is_numeric($query)) {
  if (strlen($query) == 8 ) {
    // Process the Parcel ID
  }
  else {
    // Return nothing
  }
}

Here we check to see all we have is a number. If that’s the case, we assume it’s a parcel ID. If it’s 8 characters long, we know it’s a parcel ID and we can process that. Otherwise we ignore it.

If it isn’t a PID, we start looking for everything else. So this will be in an else statement to the original if.

else {
  $query_array = explode(' ', $query);
  $pos = strpos($query, "&amp;amp;amp;amp;");

Here we’re getting an array of elements from the query string. We’re also checking to see if there’s a & character, which tells us to look for an intersection.

if (is_numeric($query_array[0]) ) {
  // find the address
}

If the first string passed is an integer, we’re assuming it’s an address. Remember we’ve already weeded out strings that are nothing but a single integer as parcel ID’s.

else if ($pos != false) {
  // Find possible intersections
}

If it wasn’t an address or a parcel ID and it has a & character in it (the strpos function will return false if the search string isn’t found) we’ll treat it as an intersection, like “Ruth & Something”.

else {
  // get points of interest
}
}

Finally, if it isn’t a parcel ID or an address or an intersection, we’ll assume it’s a point of interest (park, library, etc.), process it as such and close the else loop. We can now condense our 8 form entry monstrosity into a single search box with full autocomplete functionality, with a little help from jQuery on the client side.

Grabs address from .5 million record table in ~18ms. Thank you Postgres.

Some points of interest. Note ~* regex searching is happening.

A little intersection action.

A little help is always appreciated.

Viola – the ubersearch. The one search of Sauron. Or, you know, how the Google does it. Put your search box on top of your page and highlight it so the user’s eyes grab on to it.

There’s only one tricky bit to doing string parsing and categorizing like this: you have to keep an eye on the data. Search fields devoid of form can bite you. What if you have a Sanford & Son point of interest? The & character would make our autocomplete think it’s an intersection. What if you had a point of interest called 101 Main? Our autocomplete logic would have that be an address. So you have to watch your data. But if you can pull it off, your users will thank you for it a thousand times over.

To see an example of this, check out GeoPortal.

This entry was posted in Code. Bookmark the permalink.

4 Responses to String Parsing for Easier Searches

  1. Jason says:

    Nice work Tobin. As always refining the user experience.

  2. Ralph Dell says:

    I noticed your new search box a few weeks ago. Over the Christmas break I was looking at some GIS websites with my new daughter in law who is pretty smart and has some solid GIS experience. She never saw the new search box. That has happened to at least one other person I had looking at you site. So my question is for new users does the Search box stand out, or should the location be changed?

  3. Fuzzy says:

    That’s a good point and something I’ve been thinking about. Eye tracking of web users shows they tend to start at the top left, move down a little, and cut across as they look at a web page. So ideally, the thing you want people to notice first should be in the top left. So I’ve been thinking about flipping the search box and the title image around, so the search box is in the top left. Hmmm….

    Thanks for the feedback!

  4. Fuzzy says:

    I made the giant tool tip/arrow stay up for 45 seconds or a keypress so it’ll outlast a user reading the splash screen. Should do the trick.

    Interesting note though – the users I’ve talked to that aren’t GIS folks have a much easier time with a single search box on top, both finding it and using it. It’s the GIS “pros” that get thrown for a loop. They seem to be looking for the giant panel of boxes and buttons that come on most sites (and out of the default AGS/AIMS UI).

Leave a Reply

Your email address will not be published.

*


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>