OPD data publishing – Summer 2019 Update
OakCrime.org depends critically on the data OPD provides through its Socrata interface and API . Around April 15 of this year, this source began to dry up.
TLDR: Four months on, things seem to again be working, better than before!
OakCrime.org depends critically on the data OPD provides through its Socrata interface and API . Around April 15 of this year, this source began to dry up. Now, four months later, things seem to again be working as before. My understanding of the events between then and now are something like:
1.Omega breaks the ETL: Software sold by Omega Systems (aka/ TriTech Software Systems, tritech.com, now part of CentralSquare.com) to OPD was “updated” by them around mid-April. These updates broke scripts that had been running for many years, taking data from the older Omega tables and pumping it towards Socrata. Omega couldn’t be bothered to fix them.
2.Oakland Department of IT worked with OPD IT folks to write new, reverse-engineered scripts that restored the ETL.
3.Summer vacations were enjoyed by all.
4.Initial experiments missed some bits of the daily incident record; this got fixed. Getting the new ETL scripts scheduled to run regularly took more experiments.
Below are a couple of plots to give the big picture. The first shows the number of records that could/not be harvested. The regular pattern on the left is followed dwindling new records, then by a gap starting in mid-May with no new records, followed by a burst of new records in mid-July, some updates of missing data, and now a regular flow.
Another way to see this pattern is shown below. This shows the total number of records managed by OakCrime.org, It begins with around 214000* in January of this year, steadily increasing through April, not growing at all, a few data dumps restoring some but not all of the previous volume. (This was where things stood May 21 in my Keep Police Data Public! post). Then in mid-July things again began marching up.
But the story gets better! First, since late May 2019, the total volume of crime incidents being reported has increased! Somewhere around May 24, 2019, as shown in the plot below of the number of incidents reported each day, the average number of incidents was 132 before, but jumps to 188 after, a 40% increase!$ This is the data that Omega’s software had been previously hiding from the Socrata data stream, as noticed and described in previous posts (June 2013, January, 2014).
Second, it used to be the case that a particular crime incident would generate multiple records, each documenting specific charges associated with the incident. In April 2014 most of this data disappeared§, a discouraging change that I commented upon then. But as the plot below shows, these multiple records associated with the same incident are also back! The plot shows the number of single (“nzeroOIDX”) and multiple (“nmultOIDS”) incidents. The key bit are those early red points, showing the end of the period when they were not being reported, and the beginning of their continued reporting in late May. Something like 30% of crimes now generate more than one record. The reporting of these multiple-record incidents is especially important, because they give a richer account of especially aggregious crimes.
I’ll end the story there, at a happy ending! OPD has demonstrated that maintaining this stream of public data about their operations is important to them, too. Oakland DIT had to work constructively with the OPD IT folks (on whose machines the Omega software ran), but then demonstrated that they can work around vendor-created hassles (DIT had to reverse-engineer missing code!). I want to send props to both Capt. Chris Bolton of OPD, and Kevin Harrison of DIT for their efforts to fix this. (If this data matters to you, too, I encourage you to drop them a line yourself: Cbolton@oaklandca.gov, Kharrison2@oaklandca.gov . If you do that, I have to believe the chances that they’ll help next time something breaks will improve!
But of course more data (just like more sunlight) always shows you more. There has also been changes in the types of crimes being reported, as reflected by the “crime type” and “descrption” text fields provided for each incident! And because the OakCrime.org site uses these fields as part of its classification of incidents into the OAKCC hierarchy, a new version of this classifier was developed. But these are such interesting issues that they get their own post!
* The number of incidents mentioned are those since 2014; data going back to 2007 are available as a separate data set.
$ The fall-off of very recent days is due to a reporting delay in OPD’s data flow, seeming to stretch several weeks. The statistics above considered only dates before Sept 11 2019 as part of the “after” average.
§The FBI has established what is called the “hierarchy” rule, whereby the single, most serious charge arising from an incident is to be included in its Uniform Crime Reporting (UCR) report. The charges that had been dropped from reporting and that are now back are the lesser charges.