Alongside big-city problems like lowering the murder rate, cutting the number of stolen garbage carts may seem like small stakes. But lost garbage carts actually cost Chicago a lot of money and time -- it takes scarce resources to field the complaints, acquire new carts and pay staff to deliver them. What if data analysis could help the city minimize the number of lost carts?
Evaluating garbage cart losses with mapping software and comparing that information to streetlight failures, city staff confirmed what they had suspected: In certain neighborhoods, if the alley lights go down, garbage cart thefts spike. That intelligence gives a new sense of urgency to getting lights repaired.
"Government has been very good at collecting data, but not as good at using the data," says Brett Goldstein, the city of Chicago's CIO. So Chicago is in the process of building a predictive analytics platform that will do more analysis and much more sophisticated analysis. That work is being funded in part through a $1 million grant the city received in March as a runner-up in Bloomberg Philanthropies' Mayors Challenge, a competition to fund innovative ideas in city government.
The city still has far to go in completing the predictive platform. Goldstein has spent the past two years laying a foundation for this analytics work, including hiring experts from the private sector and academia with experience in big data and open source. His team has also created a single database on the MongoDB open source platform, into which data is fed from dozens of legacy IT systems, providing better visibility into municipal operations across departments.
The database is linked to mapping software to create an application called WindyGrid that lets city employees call up public data on a building or section of the city. WindyGrid, which went live in the past year, can be used for simple cause-and-effect tests, as in the streetlights-and-garbage-cart analysis.
Developing a full-blown predictive analytics capability is much more ambitious. As envisioned, the system will flag for city leaders leading indicators of coming problems, including those that, unlike the out-of-commission streetlights, they hadn't considered.
The goal is to apply historical analysis to predict and prevent future problems. "We have the bones of this," Goldstein says, referring to WindyGrid. The next step is "taking it and saying, 'If we're seeing this in a given neighborhood, what's likely to happen next?'"
Goldstein's team talks of ways to prevent graffiti, rodents and garbage cart thefts. But what about Chicago's more serious scourges, like its alarming homicide rate (at 506 homicides in 2012, it had more than any U.S. city, though the rate has dropped so far this year) or struggling schools (the Chicago schools CEO has proposed closing 54 underutilized schools and moving students to better-performing schools)?
Graffiti and rodents are mere starting points, says Brenna Berman, an ex-IBMer who's now first deputy CIO. "This is the approach for how this department will be part of the answer for tackling the murder rate or addressing complex emergencies like snowstorms or improving the water infrastructure," she says. The harder-to-solve problems will take more data and analysis of more variables, but "it's the exact same story for how you figure out which water mains are going to explode this year, so that we use our limited budget to improve the water infrastructure the right way over the next 10 years."
Open Source Approach
The Chicago team is using open source software for much of the predictive analytics platform. In addition to the MongoDB database for WindyGrid, the system will likely include Hadoop for some analytics processing. The team is doing its development work with the R data-analysis language.
The philosophy is to leave legacy systems in place while copying data into the single shared database for analysis. "In government, often when we want to do something big and different like this, we replace everything," Goldstein says. Since the team launched the WindyGrid project in late 2011, the approach has been to "leave everything in place," he says.
Using open source saves the city money on software and lets it start without a big investment. "This is a project that began on my laptop in the mayor's office," Goldstein says. Rather than issuing a multimillion-dollar request for proposals, he adds, "we sat in a dark room and started to build things."
There are still plenty of proprietary systems in Chicago's data center, including an Oracle Exadata system that will continue processing city transactions and commercial business intelligence software used for tasks such as creating reports and giving employees their daily assignments.
Those systems will remain as the city does its new analytics work with open source software. The city plans to share what it develops with the open source community.
Challenge Of Being Predictive
Going from today's WindyGrid data visualization to a predictive model is a step into the world of big data. "We've done really well at the multihundred-million-rows- of-data problem for spatial," Goldstein says. "I don't think we've solved the 10 billion rows of data. That is part of our plan."
Tom Schenk, Chicago's director of analytics, came to the city from nearby Northwestern University, where he did analysis of medical research. Now he's collaborating with researchers at the University of Chicago and Carnegie Mellon on the algorithms needed to make sense of many data sources. A major challenge is to build a framework that's general and flexible enough so employees can ask questions of the data. Goldstein is pushing to have something completed within 18 months, and the Bloomberg grant covers a three-year project.
One of the advantages cities have compared with private-sector companies in working with academic researchers is that cities can more easily release large swaths of data, since it's generally public information. Chicago cranks out many data sets for public consumption, providing APIs that let people access data and setting up internal systems to automatically update information as the city does. For example, the city releases through its data portal the location and speed of city buses that it tracks in half-mile increments, providing a picture of traffic congestion.
"Back in the day, if we wanted to have an academic relationship, it would take a year to execute an NDA," Goldstein says. "Now, I put it out through the data portal."
Chicago's predictive analytics and open data initiatives try to use the city's varied data sources to answer questions officials can't know today. What's the next water main that might break? What's causing robberies to rise in a neighborhood? What mashup apps would citizens use if they had access to the data? The planned predictive analytics platform is by far the city's most ambitious attempt yet to deal with such uncertainty.