A geoServer for Oakland – SmartyStreets

It’s becoming increasingly clear that a solid “geo-server” – a service providing forward and reverse address lookups as well as other geometric queries – will be a great resource for OpenOakland to provide to Oakland citizens. Thanks to a hint by Kyle Warnick, I tried out Smartystreets.

SmartyStreets limits the number of free batch queries you can do, and so (without an API key) this test was based on 292 random addresses selected from OPD_combined_130608 data set. We already had geo (lat/long) coordinates for 263 of these; 29 were missing.

Most of the test set (237) allowed comparison between existing geo coordinates with those provided by SmartyStreets. 26 addresses for which we had coordinates were not handled by Smartystreets; 21 of the 29 missing addresses could also be provided by Smartystreets.

That’s the good news. Unfortunately, based on the 237 addresses common to both methods, the addresses found by Smartystreets do not seem, in general, very accurate. Using a simple, crude (L2 distance) measure between the two methods’ geo coordinates, the figure below shows that most of the differences are very large.

OPD_SSanal_L2_dist_annote

Several sample addresses are annotated on this frequency histogram, and their details are listed below to give a sense of what these L2 distances imply:

Addr PrevLat PrevLong SSLat SSLong L2 distance
5936 GENOA ST 37.844316 -122.272773 37.73187 -122.17037 1.52E-1
2 MONTWOOD WY 37.74884 -122.135924 37.83292 -122.25898 1.49E-1
1638 103rd Ave 37.741534 -122.165234 37.80806 -122.29636 1.47E-1
Rhoda Ave & Madeline St 37.801112 -122.207468 37.75745 -122.17256 5.59E-2
1 CITY HALL PLAZA 37.805397 -122.27278 37.80597 -122.27213 8.67E-4

The worst examples (eg, “5936 GENOA ST”) are on opposite sides of Oakland!  More typical, average distance (eg, “Rhoda Ave & Madeline St”) examples are ~4m apart. In a few cases (eg, “1 CITY HALL PLAZA”), Smartystreets rediscovers about the same spot.

The figure below shows maps for these examples, with the A pin corresponding to the previous geo coordinates and the B pin to Smartystreets’. Also included is an example of a bogus address (“261P/C”, OPD data has noise!), for which Smartystreets nevertheless happily provided an address! (Never mind, some processing glitch probably from the “preview” of the full test set, but NOT due to SmartyStreets; see Michelle’s comments below.)

smartyStreets_anal

The complete set of 292 addresses and their comparison is available in this file: OPD_SSanal

Based on this analysis, I don’t see SmartyStreets suiting our purposes, but let me know if you disagree.

2 thoughts on “A geoServer for Oakland – SmartyStreets

  1. Rik,
    Thanks for doing this comparison with SmartyStreets geocode data. First, let me clarify what it is that we do: SmartyStreets specializes in address validation, and geocoding is service that we’ve added within the last 18 months. With that in mind, we’re well aware that we’re pretty new to the geocode game, and we’re constantly working to improve the accuracy of our geocode data. You can learn more about how accurate our data is here [http://smartystreets.com/kb/faq/how-accurate-is-your-geocoding-data]. We are currently working with another provider to get geodata that is way, WAY more accurate, so that’s in the works as we speak (er, type.)

    You mentioned that we restrict the number of free queries. This is true – mostly. Anyone who creates an account gets 250 free address lookups every month. And if you’re a nonprofit (school, library, church, or a 501c(3) organization) then we offer you completely free unlimited access.

    I’m a little confused by one point in your post: You say that for the bogus address (261P/C in Oakland CA), SmartyStreets “happily provided an address.” Did it return an address suggestion or a geodata point? When I run the address through SmartyStreets, it returns: “Matched 0 valid addresses. The address you entered is not recognized by the USPS, and no suitable alternatives could be matched.” It looks like it’s returning a city-centroid geodata point, though, since Oakland CA is valid.

    If you want a somewhat-exhaustive comparison of various geocode solutions, we’ve put together a spreadsheet showcasing how accurate several providers are on a range of random addresses throughout the country. You can find that here: https://docs.google.com/a/smartystreets.com/spreadsheet/ccc?key=0AidEWya_p6XFdGw1RmZ6TjB1ajZxVk81d2pISDMzVUE#gid=0

    If you (or anyone else) has questions, feel free to contact us; we love answering questions.

    • thanks very much for this context, Michelle.

      digging into the “261P/C” example, it appears to have been due to my concatenating a first “preview” test over the full (~100k) address set i have with the 250 trial addresses i did subsequently. anyway, i have zapped that offending sentence.

      and your comparison sheets seem really interesting! the “service comparison” sheet makes sense, but exactly what experiments are being summarized on the “Geocode Accuracy Comparison” sheet? %Matched, distanceDelta, … over how big a random samples? and these are using whose coordinates as truth?

Leave a Reply

Your email address will not be published. Required fields are marked *