community software lab computer icon
Community Software Lab on Facebook

Sites running our code

MVHub.com NorthShorePort.org

 

Blog

Small Funding A/B tests
domestic violence
packaged divorce
more happy users
small success
It has been a year
management by objective not for me
not open jobs/ 10 year pre
2 page accademic paper
usability
mission
micro pair programming parking
other people's poverty
marathon dental work
matching funds
Life Support Tech Tip
party (good) downtime (bad)
<insert something clever here>
finished
rant: stupid children
Parker 2007
Services for Paul Hansen
FYI CSL audit
MVHub.com ZIP code sort
status: quo
finance fiduciary responsibility
goofy pile
on time for once
prodigal update
embrace failure and anxiety
new yearhelpW
better late than never.t
boomer grant funded for $20,000
VOTE
simple and laughing at failure
$20,000
Fransico franco still dead
drunken master
PARTY !!! planning utec monks festival IRS
coffee lunch irs spam utec
control panel | bonuses | spam | virtual
money|virtual|downtime|database
We're People People Too

Valid XHTML 1.0!

MVHub.com ZIP code sort<

Thu Sep 6 19:51:11 EDT 2007

The ZIP code sort is now live on MVHub.com, so feel free to surf out
 there and have a look.  The following is some background material on how
 ZIP code sorting works; it's a bit longwinded, so read or skip it at
 your leisure.
 
 --John
 
 When I started this a month ago, I had assumed that ZIP code information
 was in the public domain, and that ZIP codes corresponded roughly to
 geographical areas.  Given that, we could download the public-domain ZIP
 info, calculate the center of each ZIP code, then do a little trig to
 calculate distances between ZIP codes.  This is _roughly_ how things work.
 
 The USPS created the Zone Improvement Plan (ZIP) codes back in the 60s
 to make mail delivery more efficient.  ZIP codes are assigned based on a
 few things.  The country is divided into ZIP code regions, with each
 region having a unique first digit (New England = 0, West Coast = 9,
 etc.).  Inside each region, each state gets a range of ZIPs (MA is
 1000-2799), not all of which are used.  Pretty obvious so far.
 
 Each individual ZIP code, however, is defined not by a geographical
 area, but by its carrier routes.  This makes sense for the postmen, who
 can say "My route goes to the end of Westford Street," but when you have
 a set of streets that might look like:
 
   /-------/
  /       /
 /        \
 |         \
 |         /
 ------------------
 
 it's tough to define a unique geographic area, especially if not all the
 streets have addresses, or if there's a body of water involved.  The
 Census Bureau took on this task back in 2000, and defined ZIP Code
 Tabulation Areas (ZCTAs).  This information is seven years old, though,
 and covers only regular (multi-address, non-P.O. Box) ZIP codes.
 
 The USPS has also defined areas for ZIP codes and sells this information
 for $50/state.  Commercial companies have licensed the USPS data and
 sell it at much more reasonable rates (approx. $50 for the entire US).
 Most websites these days use this commercial data.
 
 It's not too shocking that the Census Bureau and the USPS data don't
 quite match (pretty close, though), but it's news that Google's data
 doesn't always match the USPS's.  For example, Google calculates the
 center of the Highlands neighborhood to be just southeast of Drum Hill,
 while Yahoo, the Census Bureau, and the USPS all put the center of the
 Highlands at about Stevens and Westford streets.  The difference in the
 two locations is about a mile.  Google Maps had a few other anomalous
 ZIP codes as well.
 
 To do MVHub's ZIP code sorting, I had initially hoped to query Google
 Maps for a distance, then cache the distance in our database so we
 didn't have to query twice.  Google changed their Maps API, however, so
 the Perl module I was using (Geo::Google) to query Google Maps broke.
 The Geo::Google developers (conscientious folks that they are) sent me a
 patch within an hour of my bug report, but I felt a bit uneasy about
 relying on Geo::Google (in this case, its dependency JSON::Parser) not
 to break on future Google API changes.  Combining that with the
 suggestion of perlmonks.org users that we have our own ZIP code
 database, we purchased ($40) a list of ZIP codes, towns, latitudes, and
 longitudes from zipcodedownload.com.
 
 Once we had the latitude and longitude for each ZIP code, finding the
 distance between two ZIPs was simply using a few lines of code from the
 Geo::Distance module.  It was pretty straightforward to load the ZIP
 codes and distances into a database table, then for each MVHub program
 result, query the database for the corresponding distance.
 
 To sum up, I thought we could use Google to find ZIP information; this
 unexpectedly broke.  Using an existing ZIP code -> latitude/longitude
 database was a far better choice, and of these databases, the USPS had
 the best data.