Good Open Data Is Eating Your Own Dog Food: An Interview with Chris Whong

May 9, 2016

IMG_2892 copy

Many transit agencies have begun to embrace the concept of open data. As we recently wrote, however, actual implementation is slow and varies significantly across (or even within) agencies. TransitCenter sat down with Chris Whong, one of the leading advocates for open data in transit and a keynote speaker at our recent Staten Island Bus Hackathon, to talk about his views on the topic. Chris recently left CartoDB to begin work as a civic technologist for New York’s Department of City Planning. The interview has been condensed and edited for clarity.

You’ve spoken about the distinction between open data and open information. Tell us more about that.

Mark Headd, the former Chief Data Officer in Philadelphia, is a leader in the open data world has articulated this distinction well in recent blog posts. Essentially, the web has been around a long time—and so have portals where you can look information up and see some text on the screen. This is the kind of open information that a lot of government agencies currently have, where they are exposing a database in a limited fashion via a website. You can go to a page on their website, but the data has been curated beforehand and it’s presented in a way that lay people can understand.

That’s all well and good, but it’s not actually open data because it’s not reusable, not available in bulk, and certainly not machine-readable. And this is something that I think is really important to understand. Open data isn’t about the user experience, it’s about raw information that can be put to use in other ways.

It’s useful to think of open data as a press release: there is some inherent worth, but ultimately it’s just an intermediary. Journalists add their reporting and their analysis, and that’s what ultimately creates value for the public. Good open data, like a press release, probably isn’t useful for a broad public audience. It’s the job of technologists to process open data in ways that are helpful for the general public. A lot of my work has been about making raw data from the government even easier for third parties to consume.

When should agencies pay more attention to open data, and when to user-friendliness?

I think both are important, but the analysis of data and the publishing of data are two very different things. It’s useful for transit agencies to be able to analyze their own information, but the whole point of raw open data is that third parties should be able to check that work. In short, the act of publishing raw data in a useful way should be divorced from the analysis of that data.

Tell us about the work you did improving the bus-tracking app in Baltimore.

The Baltimore MTA wanted get realtime bus tracking information into riders’ hands. The agency’s approach was typical: put out an RFP. They ended up engaging with a proprietary software vendor and the transit community there was livid that the agency was not publishing data that could be consumed by a third-party developer. For people who didn’t want to go to the MTA’s website to see where buses were, the response from the vendor was that it would cost an extra $600,000 to make the data feed available to the public.

I said, well if they’re already showing us bus locations on a map then they’ve got the data—let’s see if we can reverse-engineer how they’re putting that data into the browser. But the bigger, more exciting thing that happened because of that is that Transit App saw they could do the exact same thing for other transit agencies they had written off. The whole point of transit data standards is that once I write code for one, I can write code for 50. We didn’t push them into a standard by any stretch but we were able to show how a web developer could consume the real-time data and use it elsewhere.

In a recent blog post, you coined the term “fooddogging.” What is that?

It begins with dogfooding, or the idea that you should use your own products or “eat your own dog food.” It’s the same with transit agencies and data. Too often, the data that an agency puts out as open data is just a byproduct that they would never use on their own. It’s not good enough or not in the right format—incomplete, truncated, etc. And it’s the wrong thing to do because then there are two classes of data, one that privileged people have access to that’s actually useful, and one that the public sees.

In transit, dogfooding means that the open data resources you create for public use are the same ones that you can and do use internally. If a transit agency uses the same realtime data feed for their countdown clocks that they expose to developers as an API, they are eating their own dogfood because what’s good enough for the open data consumers is good enough for them.

“Fooddogging” describes the process of sharing data as a byproduct of an analysis or map project. The project may not have been made with data published elsewhere, but at the very least the data it is presenting can be downloaded right there on the spot in an open format. This is actually advantageous.

How can small governments and transit agencies participate in the open data movement? What are the barriers to entry for small players?

Technically, sharing data is a really easy thing to do—especially with small datasets, you can use a GUI [graphical user interface] to upload them to Github or Dropbox, or just use an FTP server. Publish the links on the agency’s website along with some basic metadata, and you’re doing open data. If you want to get fancy you can set up automations, but there’s nothing wrong with starting simple.

I think data should be published as close to home as possible, meaning whoever maintains the data should be publishing it. If it has to move around too many times and go through too many hands before it sees the light of day, there are more points of failure, more bottlenecks, and delays.

Back to
the Blog

Never miss a post! Sign up for our newsletter, the Dispatch: