The partially realized promise of open transit data



The era of open data in public transportation has just begun. Yet it’s already clear that simply talking about data’s potential to transform mass transit and the experience of riders—whether through optimistic press releases or boosterish IBM commercials—won’t necessarily make it happen.

This is the challenge facing both transit agencies and technologists who want to implement and take advantage of open data policies. An aggressive public relations strategy is a useful tool, but it must be paired with an equally aggressive policy agenda—one that not only provides data already in a transit agency’s possession, but that also seeks to identify and collect new kinds of data that might make public transportation better in the future.

The proof will lie in the emergence of real-world cases where public provision of agency data can be shown to lead to real improvements in transit service.

The Valley Transportation Authority—the transit provider in Santa Clara County, California—recently announced open data would be its “default” policy and launched a portal to centralize all available data. Of the 18 datasets available, though, the eight most recent are just occasional updates to the system’s route and stop information. The ridership dataset is more than a year old.

The VTA is, so far, the transit agency most visibly committed to the principle that all of its data should be public. But while the VTA’s web portal is in its infancy and might expand its scope and currency in the future, it’s not necessarily emblematic of a transit agency putting an open data policy to its best possible use. What’s more, it points to a principle that’s often overlooked: Open data is a means, not an end. If an agency is too hide-bound to act and does not allow data to inform its decision-making, open data may not live up to its promises.

Some of the best uses of open data in transportation right now are coming from agencies that have not made a rhetorical commitment to it per se, but are nonetheless providing data in a way that allows the public to hold transportation leaders accountable to policy goals they have articulated.

As part of its Vision Zero campaign, for instance, the New York City Mayor’s Office has made data available from the police department’s Traffic Accident Management System. The data have proven a potent tool for journalists and advocacy groups to monitor (and criticize) the progress of New York’s transportation and police departments in reducing traffic deaths. Provision of the data, and its presentation by City Hall on a public map that is updated monthly, may not be in city government’s short-term public relations interest. But in practice, it keeps the city’s feet to the fire regarding its goal of continuous improvement in street safety.

For transit agencies, open data can most immediately mean better information for riders. But whether it will be used to create a clearer look at performance, and build the internal and external feedback and accountability pressures to improve it remains an open question.

MTA New York City Transit is an instructive example, as its various divisions’ data policies run the gamut from good to poor. Transit’s Bus Time platform, for instance, provides accurate up-to-the-minute bus arrival information for web and smartphone users on a scale that many thought impossible just a decade ago. In addition, bus officials have been receptive to outside efforts to collect and use the Bus Time data to analyze and improve service. The MTA partnered with TransitCenter and NYU’s Rudin Center on our recent Staten Island bus hackathon, and has held other app development contests.

Subways seem another matter. New York City Transit’s lack of urgency to complete installation of subway platform countdown clocks and improved signals (for its lettered subway lines–the “B Division” trains) reflects incomprehension or unwillingness to provide the public amenity of good information. The impact goes beyond the lack of physical clocks in the stations. Numerous apps that provide “real-time” arrival information for smartphone users currently rely on an application program interface that covers only the A Division trains. For B Division trains, they must instead use pre-published (and often inaccurate) schedule data.

New York’s B Division riders are not alone. While there have been advancements across the country in how transit agencies use data and provide it to the public, the vast majority of riders’ experiences are more or less the same as they were ten or twenty years ago. (In cities that have seen service cuts in that time, they are arguably worse.)  The good news is that the use of data to better inform the riding public also leads toward more sophisticated and granular performance measurement. Agencies like the VTA and MTA recognize that other jurisdictions will surpass them if they move too slowly.