Using Twitter, Smartphones To Accurately Assess Crowds

As part of the Big Data Challenge, researchers from England's University of Warwick have turned to smartphones and Twitter to estimate crowd sizes quickly and accurately.

Curtis Franklin Jr., Senior Editor at Dark Reading

May 29, 2015

4 Min Read
<p align="left">(Image: <a href="" target="_blank">JESHOOTS</a> via Pixabay)</p>

7 Bold Tech Ideas That Will Make You Uncomfortable

7 Bold Tech Ideas That Will Make You Uncomfortable

7 Bold Tech Ideas That Will Make You Uncomfortable (Click image for larger view and slideshow.)

Whenever there's a crowd there will be estimates of how many people were there. History shows that those estimates tend to be highly variable and terribly inaccurate. Now, researchers from England's University of Warwick have turned to smartphones and Twitter to estimate crowd sizes quickly and accurately.

This has obvious implications for law enforcement and other government agencies, but is also valuable for any enterprise interested in assessing crowd volume. For example, US retailers could use such a solution to obtain an accurate assessment of post-Thanksgiving Black Friday shopping crowds. Sports teams and concert promoters could make use of these tools to gauge activity during events. And transit systems could make use of such data to understand commuter travel patterns.

The research, conducted by Federico Botta, PhD student; Suzy Moat, assistant professor of Behavioral Science; and Tobias Preis, associate professor of Behavioral Science and Finance at University of Warwick's Warwick Business school, has broad implications because it doesn't depend on reporting from a special app that sits on users' smartphones. It looks, instead, at activity from those smartphones, and the number of smartphones in use in a given area, to estimate crowd size and even predict future activity. Because these methods don't require identifying individual users or looking at the contents of their activities, they shouldn't require as much judicial approval as other, more intrusive techniques.

In a telephone interview with the three researchers, Preis said, "We thought that people would be walking around with smartphones in their pockets, and even if they're not using them they would be connected to the Internet. We thought we could estimate the crowd size accurately and very quickly." He further explained, "Twitter was one of the components we looked at. We also used the volume of phone calls and SMS calls, as well as the volume of access to the Internet."

Accuracy in estimating the size comes from correlating data from these multiple sources. When asked about access to call and SMS data, the three explained that their data was made available by the Italian telecom company Telecom Italia as part of the 2014 Big Data Challenge. For the challenge, the company made multiple data sets available for researchers and software developers to work with.

[ Curious about how crowds can be used in software development? Read Software Development Taps The Power Of Crowds. ]

One of the challenge's data sets involved smartphone use in and around a sports stadium during a football (soccer) match. Preis said, "We focused on a football stadium. This was a great test case because the number of people inside is well known." The known crowd size meant that the researchers were able to test their theories and algorithms and check the accuracy against a known answer.

As for the models and algorithms themselves, Preis said, "The analysis was stunningly straightforward." Explaining that most of the software the researchers used is available through open source licenses, Preis said, "We're using the statistical package 'R' and extending the analytics. We also wrote customized tools and programs in R and C, depending on the complexity and the run-time requirements."

Because the team members based their work on multiple data sources and types, they weren't able to use a single program for the entire project. "C, Python, and other interfaces were used. Obviously, we needed to work with a lot of APIs, depending on the data source," Preis said. As for the hardware involved, Botta said that most software and services were run on commodity Intel hardware, though some of the more compute-intensive routines were hosted on the GPU-based systems that now make up a large percentage of high-performance computing platforms.

As for the future, Moat said that publication of these results doesn't mark the end of the work. "The research we presented is part of a larger body of research," she said. "We're trying to see if we can use online data to analyze and predict what people are doing in the real world. In the past we've used data from Google and Wikipedia to look at movements in the stock market, and in epidemiology such as flu epidemics," she explained. Now, "... we're working in crime and crowd size estimation," Moat said.

Those interesting in similar work should take note of the 2015 Big Data Challenge, which is now open.

[Did you miss any of the InformationWeek Conference in Las Vegas last month? Don't worry: We have you covered. Check out what our speakers had to say and see tweets from the show. Let's keep the conversation going.]

About the Author(s)

Curtis Franklin Jr.

Senior Editor at Dark Reading

Curtis Franklin Jr. is Senior Editor at Dark Reading. In this role he focuses on product and technology coverage for the publication. In addition he works on audio and video programming for Dark Reading and contributes to activities at Interop ITX, Black Hat, INsecurity, and other conferences.

Previously he was editor of Light Reading's Security Now and executive editor, technology, at InformationWeek where he was also executive producer of InformationWeek's online radio and podcast episodes.

Curtis has been writing about technologies and products in computing and networking since the early 1980s. He has contributed to a number of technology-industry publications including Enterprise Efficiency, ChannelWeb, Network Computing, InfoWorld, PCWorld, Dark Reading, and on subjects ranging from mobile enterprise computing to enterprise security and wireless networking.

Curtis is the author of thousands of articles, the co-author of five books, and has been a frequent speaker at computer and networking industry conferences across North America and Europe. His most popular book, The Absolute Beginner's Guide to Podcasting, with co-author George Colombo, was published by Que Books. His most recent book, Cloud Computing: Technologies and Strategies of the Ubiquitous Data Center, with co-author Brian Chee, was released in April 2010. His next book, Securing the Cloud: Security Strategies for the Ubiquitous Data Center, with co-author Brian Chee, is scheduled for release in the Fall of 2018.

When he's not writing, Curtis is a painter, photographer, cook, and multi-instrumentalist musician. He is active in amateur radio (KG4GWA), scuba diving, stand-up paddleboarding, and is a certified Florida Master Naturalist.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights