OPD data publishing – update

OPD data publishing – Summer 2019 Update

OakCrime.org depends critically on the data OPD provides through its Socrata interface and API . Around April 15 of this year, this source began to dry up.

TLDR: Four months on, things seem to again be working about as before!

My understanding of the events between then and now are something like:

  1. Omega breaks the ETL: Software sold by Omega Systems (aka/ TriTech Software Systems, tritech.com, now part of CentralSquare.com)  to OPD was “updated” by them around mid-April.  These updates broke scripts that had been written to take data off the older Omega tables and pump it towards Socrata.  Omega couldn’t be bothered to fix them.

  2. Oakland Department of IT worked with OPD IT folks to write new scripts that restored the ETL.

  3. Summer vacations were enjoyed by all.

  4. Initial experiments missed some bits of the daily incident record; this got fixed.  Getting the new ETL scripts scheduled to run regularly took more experiments.

Below are a couple of plots to give the big picture.  The first shows the number of records that could/not be harvested.  The regular pattern on the left is followed dwindling new records, then by a gap starting in mid-May with no new records, followed by a burst of new records in mid-July, some updates of missing data, and now a regular flow.

Another way to see this pattern is shown below.  This shows the total number of records managed by OakCrime.org, It begins with around 214000* in January of this year, steadily increasing through April, not growing at all during early May, a few data dumps restoring some but not all of the previous volume. (This was where things stood May 21 in my Keep Police Data Public! post).  Then in mid-July things again began marching up.

I could just end the story there, with a happy ending!  OPD has demonstrated that maintaining this stream of public data about their operations is important to them, too.  Oakland DIT had to work constructively with the OPD IT folks (on whose machines the Omega software ran), but then demonstrated that they can work around vendor-created hassles.  I want to send props to both Capt. Chris Bolton of OPD, and Kevin Harrison of DIT for their efforts to fix this.  (If this data matters to you, too, I encourage you to drop them a line yourself: Cbolton@oaklandca.gov, Kharrison2@oaklandca.gov .)

But for those of you that have read this far, I’ll mention more two fine points that have also come out of these recent monitoring and analysis efforts.  The first is that address reporting has become more precise:  The address field associated with incidents had typically been rounded to the hundred block, e.g.,  “2345 Oak Street” was transformed to  “2300 Oak Street.”  But as the plot below shows, at the exact time Omega’s old Socrata filter was removed and replaced with the new one, the fraction of addresses rounded dropped, varied through May and June, and has now dropped to about 20%.  

The second fine point is that, because many incidents were (initially) missing their CrimeType attribute, and because the OakCrime.org site made use of this field as part of its CrimeCatOAK classification of incidents, a new version of this classifier was developed.  But that’s such an interesting story that it gets its own post!

*(The number of incidents mentioned are those since 2014; earlier data going back to 2007 are available as a separate data set.)

Leave a Reply

Your email address will not be published. Required fields are marked *