It’s becoming increasingly clear that a solid “geo-server” – a service providing forward and reverse address lookups as well as other geometric queries – will be a great resource for OpenOakland to provide to Oakland citizens. Thanks to a hint by Kyle Warnick, I tried out Smartystreets.
SmartyStreets limits the number of free batch queries you can do, and so (without an API key) this test was based on 292 random addresses selected from OPD_combined_130608 data set. We already had geo (lat/long) coordinates for 263 of these; 29 were missing.
Most of the test set (237) allowed comparison between existing geo coordinates with those provided by SmartyStreets. 26 addresses for which we had coordinates were not handled by Smartystreets; 21 of the 29 missing addresses could also be provided by Smartystreets.
That’s the good news. Unfortunately, based on the 237 addresses common to both methods, the addresses found by Smartystreets do not seem, in general, very accurate. Using a simple, crude (L2 distance) measure between the two methods’ geo coordinates, the figure below shows that most of the differences are very large.
Several sample addresses are annotated on this frequency histogram, and their details are listed below to give a sense of what these L2 distances imply:
|5936 GENOA ST||37.844316||-122.272773||37.73187||-122.17037||1.52E-1|
|2 MONTWOOD WY||37.74884||-122.135924||37.83292||-122.25898||1.49E-1|
|1638 103rd Ave||37.741534||-122.165234||37.80806||-122.29636||1.47E-1|
|Rhoda Ave & Madeline St||37.801112||-122.207468||37.75745||-122.17256||5.59E-2|
|1 CITY HALL PLAZA||37.805397||-122.27278||37.80597||-122.27213||8.67E-4|
The worst examples (eg, “5936 GENOA ST”) are on opposite sides of Oakland! More typical, average distance (eg, “Rhoda Ave & Madeline St”) examples are ~4m apart. In a few cases (eg, “1 CITY HALL PLAZA”), Smartystreets rediscovers about the same spot.
The figure below shows maps for these examples, with the A pin corresponding to the previous geo coordinates and the B pin to Smartystreets’.
Also included is an example of a bogus address (“261P/C”, OPD data has noise!), for which Smartystreets nevertheless happily provided an address! (Never mind, some processing glitch probably from the “preview” of the full test set, but NOT due to SmartyStreets; see Michelle’s comments below.)
The complete set of 292 addresses and their comparison is available in this file: OPD_SSanal
Based on this analysis, I don’t see SmartyStreets suiting our purposes, but let me know if you disagree.