It was 7:00 a.m. on a Saturday morning, and the 10th floor of Los Angeles City Hall was filled with more than 450 people gathered to spend their day off of work with their noses buried in their laptops. They were data scientists, and they had come to innovate new technologies to solve complex social problems using the city’s newly open data.
This devotion to the common good brought out the best in the participants. “A lot of families will pick a home that’s really cheap and then overspend on transportation,” said Jasmine Dahilig, 21, a recent graduate of Loyola Marymount University. She and her teammates were devising an affordability index to help families balance those costs. Another team was working to coordinate assistance for homeless shelters. “Churches have volunteers who want to help. Restaurants want to donate food,” said Zach Latta, 16, describing his goal to use new public data sources as the means to solve important social problems. The scene took place at the 2014 edition of Hack for LA, a two-day event on May 31 and June 1, held to build apps that support public causes using open data.
The open data movement is part of a growing effort to make city government more accountable through technology. For example, the city of Los Angeles recently unveiled a new open data portal called DataLA, which is loaded with information on how the city works, including data on stray animals, bicycle lanes, graffiti cleanup and other municipal services.
The recent hackathon attracted teams of data aficionados to compete for thousands of dollars in prize money on challenges to create apps to boost economic development, transform underrepresented communities, and make residents safer.
So far, the emergent open data movement has resulted in 38 states and 46 cities offering open data repositories free to use by anyone. (Here’s a current directory of government open data sites.) In many cases, data scientists, working on a pro bono basis, find ways to use the previously unavailable information to make a real difference. One organization, DataKind, brings together leading data scientists with high-impact social organizations through a comprehensive, collaborative approach that leads to shared insights, greater understanding, and positive action through data in the service of humanity.
Open Data As a Growing Trend
Los Angeles is not the first city to post such data online for public consumption. New York has more than 1,100 data sets on its open data Website. (Los Angeles has 202 but the list is growing.) San Francisco is another frontrunner at opening up its data coffers to the public. Its open data portal, San Francisco Data, has been around since early 2012, and it provides special tools to provide help for novice data explorers as well as seasoned application developers.
The site even has an App Showcase featuring cool apps that have been developed using open data. One good example is SFpark.org, which offers real-time availability and prices for parking spaces on streets and in city garages. Another app called SF Bus Predictions ended up saving the city nearly a million dollars just around public transportation. It tells people when buses are going to show up, so fewer people are calling the city to ask when the next bus is coming. Open data also requires collaboration. A recent partnership with Yelp led the way to standardizing restaurant hygiene scores and made that information available so that it is now incorporated into the Website. So if you visit Yelp, you can see the restaurant’s score before you sit down to dine.
The Los Angeles open data portal has plans for expansion. City Controller Ron Galperin had a Website detailing the city’s finances months before DataLA went live. The controller’s data is now shared on the portal site. City officials say more data will be added gradually to the portal under a directive Mayor Eric Garcetti issued in December that requires each of the city’s 41 departments to publish all the data they can. Exceptions include information with legal restrictions regarding privacy or security.
Open Data Trumps Web Scraping
Before open data portals began to take hold, it used to be that not all data was created equal nor easily harvested. Collecting and disseminating civic data was more often than not a labor of love born out of necessity. In order to obtain data for analysis from city and state government agency Websites, you had to use a method called "Web scraping" where you’d make use of open-source Python libraries offering scripts that parse HTML from Web pages for relevant data. As more open data portals come online, the Web scraping days seem to be over. The portals provide a much more powerful process to search for and filter data, all without any programming.
Data Hacktivists Reign
There’s an assumption that a city can build a portal and data scientists will automatically come. That government could put out data and people could come and build useful apps is a nice concept, but it really doesn’t work like that. It’s nice to release data sets, but you have to have a way to turn that data into knowledge and then get that knowledge to people in an actionable manner. That’s where the hacktivists come in. The hope is for data hacktivists to turn city and state data into user-friendly apps. Using methods of data science and machine learning, these data enthusiasts can build a next generation of apps purely for altruistic reasons. If the success of various hackathons and organizations like Datakind.com are indicators, the field of open data has a very bright future.
The open data movement for local and state governments is a positive force for assisting people in fundamental ways using technology as a facilitator. Although this effort is young, the next few years will serve as a litmus test for just how much good data can play in making a positive impact on people’s lives.
Daniel D. Gutierrez is a Los Angeles–based data scientist working for a broad range of clients through his consultancy AMULET Analytics. He’s been involved with data science and Big Data since long before it came in vogue, so imagine his delight when the Harvard Business Review deemed “data scientist” as the sexiest profession for the 21st century. He is also a recognized Big Data journalist and is working on a new machine-learning book due out in later this year.