Address Normalization in PostgreSQL

I’ve found there’s some confusion as to the difference between geocoding and address normalization (or address parsing), due to the fact that ESRI’s geocoding engines tend to do both at the same time.

On my web sites, I can control the way a user enters her or his address. I usually make people enter each address part (house number, street name, street type, etc.) in to different form elements. That’s the fastest and most accurate way to go when it comes to crunching time. You may not have that luxury, however. Sooner or later you are going to get a dbf full of addresses - often including city, state, and zip - as a single string.

Address normalization is the process of taking that single string and breaking it up in to its component pieces. If you are a programmer, just thinking about all the looping, querying, if-then-elsing required to parse that string will give you a headache.

Fortunately I was able to lift some code out of another project (Tiger Geocoder) and modify the code to just do address normalization. It makes a couple of street type and state look up tables and adds some functions to Postgres. The look up tables are a good way to go - you can easily modify them to handle your own personal address normalization demons. Wrap a web service around it and viola, you have an enterprise address normalizer.

You can get the SQL script here. You’ll also need to make sure the fuzzy search strings module is installed (share/contrib/fuzzystrmatch.sql). It could probably be made to run in most other RDBMS (Postgres adheres very tightly to the ANSI SQL standards) with minor modifications. OK, major modifications.

Incidentally, if you don’t have your own centerline to geocode against, check out the Tiger Geocoder project referenced earlier. Neat stuff.


My personal address normalization demons: W. T. Harris (W. is not a direction) and The Plaza (Plaza not being the street type).