Infectious diseases inflict a tremendous human and economic toll. The Zika virus alone could cost Latin America and the Caribbean up to $18 billion according to the United Nations.
When it comes to epidemics, we as a society suffer from a lack of timely data, disparate datasets that are difficult to collate, and a shortage of people with computational backgrounds who are involved in epidemic planning, mitigation, and response.
However, the data science revolution is allowing society to overcome these challenges, and epidemics can now be more effectively monitored, modeled and mitigated. In this article, I will outline five ways big data analytics are transforming epidemics.
Better genetic data
Faster, cheaper genome sequencing is producing massive quantities of big data, which allows for powerful analytics into how microbes mutate while an outbreak unfolds in real-time. One big challenge for outbreak response is that genetic data are not available quickly enough. Often this is because of sample collection and testing delays, lack of collaboration and reporting tools, or holding data for publication in scientific literature. These barriers are now breaking down with the advent of Nextstrain, a tool that allows for sharing and tracking of genome sequences in real-time to improve outbreak response. Having these data available and analyzed more quickly helps track the source, evolution, and possibility of epidemic risk.
Cell phone mobility data
The proliferation of mobile devices means it is now possible to track how people move and better understand the path of an infectious disease. For example, GPS coordinates derived from cell phone data in West Africa allowed experts to track contacts of Ebola cases, which in turn helped inform where to focus preventive measures, as well as contain the spread. This type of tracking is useful not only for piecing together what is going on during an outbreak, but it also can help us predict how diseases could move in future outbreaks and understand what interventions would be most effective.
Social media data
The abundance of data from social media has been mined to gain insight into the timing and geography of disease spread, such as seasonal influenza in the U.S. For example, researchers have used Twitter to better predict when the flu season will peak. Researchers have also applied natural language processing algorithms on social media data to perform sentiment analysis on topics such as likelihood of vaccination and level of fear felt during epidemics. These analyses can help to target control measures, public health messaging, and can also help to estimate economic impacts of fear-induced behavioral changes, such as avoidance of public places.
Mapping high-risk areas
Machine learning techniques can now yield global, high-resolution maps pinpointing where epidemics are likely to emerge and take hold. These techniques make use of remotely-sensed and other geographic data about environmental, human and animal factors to estimate how many people live in the riskiest places. For example, this type of analysis helped map likely locations for Zika virus to thrive and even identified areas where the virus would later establish itself, including southern Florida. With the advent of greater, cheaper computing power and the increasing availability of globally consistent, high-resolution geospatial datasets, this type of predictive modeling will become even more powerful.
Big data is also transforming epidemics through massive numbers of high-resolution, global simulations of epidemic spread. Experts have been able to take complex infectious disease datasets and feed them into large-scale computational disease spread models to generate hundreds of terabytes of computer-generated outbreak simulations. These simulations help fill in gaps in observed data using synthetic outbreaks and deliver novel insights into possible outcomes of outbreaks, including expected numbers of illnesses, hospitalizations, deaths, employee absences and monetary losses. Ultimately, these insights can help inform the world about epidemic risks and the best ways to mitigate them.
As big data and analytical tools have transformed every industry from healthcare to retail to government, and now are transforming the way epidemics are understood and managed. Despite these advances, the world still is not prepared for the next pandemic. Technological advances for understanding epidemics lag behind those in other industries. If we could predict epidemics as well as online retailers can predict how much someone is willing to pay for the latest widget, many more businesses could insure themselves against loss, many more governments could develop effective interventions, and many more lives could be saved.
Nita Madhav is Senior Director of Data Science at Metabiota, where she oversees the teams responsible for infectious disease, actuarial, and statistical modeling. Ms. Madhav has over 11 years of experience in probabilistic modeling and risk assessment. The majority of her experience has focused on developing infectious disease risk, burden, and costing models to provide actionable insights to commercial and government entities. While at Metabiota, Ms. Madhav established the modeling group and has spearheaded the team’s efforts to create a comprehensive library of modeled pathogens. Before joining Metabiota, Ms. Madhav worked as a Principal Scientist at AIR Worldwide, where she led the life and health research and modeling team. Prior to that, Ms. Madhav performed hantavirus research at the Special Pathogens Branch of the US Centers for Disease Control and Prevention. Ms. Madhav holds a BS in Ecology & Evolutionary Biology, with distinction, from Yale University and an MSPH in Epidemiology from the Rollins School of Public Health at Emory University.