Whenever there's a crowd there will be estimates of how many people were there. History shows that those estimates tend to be highly variable and terribly inaccurate. Now, researchers from England's University of Warwick have turned to smartphones and Twitter to estimate crowd sizes quickly and accurately.
This has obvious implications for law enforcement and other government agencies, but is also valuable for any enterprise interested in assessing crowd volume. For example, US retailers could use such a solution to obtain an accurate assessment of post-Thanksgiving Black Friday shopping crowds. Sports teams and concert promoters could make use of these tools to gauge activity during events. And transit systems could make use of such data to understand commuter travel patterns.
The research, conducted by Federico Botta, PhD student; Suzy Moat, assistant professor of Behavioral Science; and Tobias Preis, associate professor of Behavioral Science and Finance at University of Warwick's Warwick Business school, has broad implications because it doesn't depend on reporting from a special app that sits on users' smartphones. It looks, instead, at activity from those smartphones, and the number of smartphones in use in a given area, to estimate crowd size and even predict future activity. Because these methods don't require identifying individual users or looking at the contents of their activities, they shouldn't require as much judicial approval as other, more intrusive techniques.
In a telephone interview with the three researchers, Preis said, "We thought that people would be walking around with smartphones in their pockets, and even if they're not using them they would be connected to the Internet. We thought we could estimate the crowd size accurately and very quickly." He further explained, "Twitter was one of the components we looked at. We also used the volume of phone calls and SMS calls, as well as the volume of access to the Internet."
Accuracy in estimating the size comes from correlating data from these multiple sources. When asked about access to call and SMS data, the three explained that their data was made available by the Italian telecom company Telecom Italia as part of the 2014 Big Data Challenge. For the challenge, the company made multiple data sets available for researchers and software developers to work with.
[ Curious about how crowds can be used in software development? Read Software Development Taps The Power Of Crowds. ]
One of the challenge's data sets involved smartphone use in and around a sports stadium during a football (soccer) match. Preis said, "We focused on a football stadium. This was a great test case because the number of people inside is well known." The known crowd size meant that the researchers were able to test their theories and algorithms and check the accuracy against a known answer.
As for the models and algorithms themselves, Preis said, "The analysis was stunningly straightforward." Explaining that most of the software the researchers used is available through open source licenses, Preis said, "We're using the statistical package 'R' and extending the analytics. We also wrote customized tools and programs in R and C, depending on the complexity and the run-time requirements."
Because the team members based their work on multiple data sources and types, they weren't able to use a single program for the entire project. "C, Python, and other interfaces were used. Obviously, we needed to work with a lot of APIs, depending on the data source," Preis said. As for the hardware involved, Botta said that most software and services were run on commodity Intel hardware, though some of the more compute-intensive routines were hosted on the GPU-based systems that now make up a large percentage of high-performance computing platforms.
As for the future, Moat said that publication of these results doesn't mark the end of the work. "The research we presented is part of a larger body of research," she said. "We're trying to see if we can use online data to analyze and predict what people are doing in the real world. In the past we've used data from Google and Wikipedia to look at movements in the stock market, and in epidemiology such as flu epidemics," she explained. Now, "... we're working in crime and crowd size estimation," Moat said.
Those interesting in similar work should take note of the 2015 Big Data Challenge, which is now open.
[Did you miss any of the InformationWeek Conference in Las Vegas last month? Don't worry: We have you covered. Check out what our speakers had to say and see tweets from the show. Let's keep the conversation going.]