String Parsing for Easier Searches
Through a lot of testing (and swearing), I’ve found the #1 thing users botch on a web application is text entry. Nothing else comes close. That seemingly innocent text box with the blinking cursor is the primary place your user-programmer dance will end with broken toes and hurt feelings.
One of the things I do to reduce text entry problems is to autocomplete text fields wherever I can. The user can start typing in a street name or an address and an autocomplete list will show up. With the user only having to get the first few characters of a search string right, my number of didn’t find nothin’ search results goes way down, and the user-programmer dance gets a little better.
But this still left me with a complexity problem. The search area in one of my apps had a text box for addresses, a text box for places, a text box for a parcel ID, a text box for a street name, two text boxes for intersections, and two drop down lists for different types of government facilities. That’s a whopping 8 form entry fields to perform all of the various searches.
I started thinking about condensing this mess into a single search box. I needed to keep my autocomplete functionality to reduce user headaches, but autocomplete functions have to be sub-second fast to be useful. Otherwise the user outruns them when typing and they don’t do anybody any good. And I couldn’t very well search on everything every time and keep the database calls fast.
Time for some string parsing goodness.
Check these search string snippets out:
- 101 Main
- Abbey Park
- Ruth
- Ruth & Dolphin
- 12312312
- Address: <Integer><space><string>
- Place Name: <string>
- Street Name: <string>
- Intersection: <string><& character><string>
- Parcel ID: len(<string>) > 8 and isInt(<string>)
Let’s take a look at how that might look in PHP. We’re looking at the string processing and logic here - the nitty gritty processing code will be specific to your data. First, we’ll get the user input.
1 | $query = preg_replace('/\s\s+/', ' ', trim($_REQUEST['query'])); |
The regex is just replacing extra spaces in the search string. The trim gets rid of leading or trailing white space. No more regex, I promise.
Now we just need some string testing to see what we’ve got.
1 | if (is_numeric($query)) { |
Here we check to see all we have is a number. If that’s the case, we assume it’s a parcel ID. If it’s 8 characters long, we know it’s a parcel ID and we can process that. Otherwise we ignore it.
If it isn’t a PID, we start looking for everything else. So this will be in an else statement to the original if.
1 | else { |
Here we’re getting an array of elements from the query string. We’re also checking to see if there’s a & character, which tells us to look for an intersection.
1 | if (is_numeric($query_array[0]) ) { |
If the first string passed is an integer, we’re assuming it’s an address. Remember we’ve already weeded out strings that are nothing but a single integer as parcel ID’s.
1 | else if ($pos != false) { |
If it wasn’t an address or a parcel ID and it has a & character in it (the strpos function will return false if the search string isn’t found) we’ll treat it as an intersection, like “Ruth & Something”.
1 | else { |
Finally, if it isn’t a parcel ID or an address or an intersection, we’ll assume it’s a point of interest (park, library, etc.), process it as such and close the else loop. We can now condense our 8 form entry monstrosity into a single search box with full autocomplete functionality, with a little help from jQuery on the client side.
Viola - the ubersearch. The one search of Sauron. Or, you know, how the Google does it. Put your search box on top of your page and highlight it so the user’s eyes grab on to it.
There’s only one tricky bit to doing string parsing and categorizing like this: you have to keep an eye on the data. Search fields devoid of form can bite you. What if you have a Sanford & Son point of interest? The & character would make our autocomplete think it’s an intersection. What if you had a point of interest called 101 Main? Our autocomplete logic would have that be an address. So you have to watch your data. But if you can pull it off, your users will thank you for it a thousand times over.
To see an example of this, check out GeoPortal.