GSK Updates Analytics Platform for Faster Drug Development - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
10:00 AM
Connect Directly

GSK Updates Analytics Platform for Faster Drug Development

GSK began an overhaul transformation of its R&D data and analytics infrastructure three years ago. Here's what it looks like today.

Creating a new drug can take anywhere from 8 years to 20 years for a pharmaceutical company. To put that in perspective, on the early end of that timeline, the iPad was introduced 8 years ago in 2010. And on the far side of that timeline, Google didn't exist 20 years ago -- it was founded in September 1998. Technology is moving along much faster than new drug development for pharmaceutical companies.

GlaxoSmithKline (GSK) would like to make that new drug discovery and development timeline a lot shorter. The London-based company is among the largest pharmaceutical companies in the world. Mark Ramsey joined GSK in 2015 as the pharmaceutical giant's first chief data officer (CDO), reporting to the president of research and development and focused on creating a team and establishing a platform to support data and analytics for the company. Ultimately, the goal is to speed up the drug development process to 2 years.

(Image: Willy Barton/Shutterstock)

(Image: Willy Barton/Shutterstock)

So where do you begin to address the data and analytics challenges presented by a centuries old company and leapfrog ahead to a place where efficiency can accelerate drug development?

"What we didn't want to do was to build a single use case," Ramsey told InformationWeek in an interview. Ramsey said that he's seen organizations have trouble expanding the initial solution when they start too small, even though many analysts recommend starting small when implementing an initial analytics project.

Instead, Ramsey began with performing an inventory across R&D and the portfolio of use cases to get a sense of everything that his program would touch. The approach was an important first step in building the program that would break down the data-flow barriers among the companies many siloed operations.  For instance, the clinical trial area is a silo. Experiments by scientists are a silo. Lessons from other organizations are a silo. The inventory project became the foundation for designing an architecture and approach for the entire organization.

"There's a lot of discussion around machine learning, artificial intelligence, and deep learning," Ramsey said. "But you need the data in order to be able to feed those technologies."

Ramsey's group focused on bringing that siloed data together. "We are now delivering collections of data and collections of use cases," he said.

Ramsey's data and analytics stack includes multiple technologies, with the foundation based on Cloudera's Hadoop.

"That's our primary data and information platform -- the source where we store our curated data and our analytics processes." The stack also includes Kafka and Spark. Other technologies include StreamSets for data ingestion (which has been completely automated with bots), Tamr for machine learning data curation, Trifacta for data wrangling, and AtScale for virtualization across environments. AtScale lets users leverage familiar BI tools for insights from the Hadoop environment. GSK also uses Zoomdata for data visualization, Docker for some of its containerization, Kinetica for GPU-based analytics, and Waterline Data for storage and search. The total solution amounts to more than 5 petabytes of data, all on-premises.

"It's still a way too complex environment," he told me. "We are really working with each of those organizations so that meta data and interoperability come together." The goal, of course, is to "really bring them together as a well-integrated ecosystem."

Users across the enterprise are consuming all this different data in different ways. Ramsey said that a large number of people access the data through guided analytics in the form of a structured query or dashboard.

GSK also has about 500 to 600 "bench chemists" who have been using an Excel plugin for many years to get data about experiments.

Another 20% of the organization uses Python, R, or other analytics tools to leverage a computational notebook. They are focused less on visualization and more on developing routines that run against the data. Another 10% of staff are using the platform for machine learning and deep learning -- running simulations and algorithms.
"One of the big challenges is that even though Hadoop and related technologies have been in the market for a while, bringing them all together is more difficult than what I think it could be," Ramsey said. "I think that's one of the reasons we don't see a lot of production-level Hadoop on a larger scale. It's more difficult to make it happen than it should be, and that is putting a constraint on the industry."

GSK hasn't achieved 2-year drug development yet, but the data and analytics platform environment created by Ramsey and his team has brought the company closer to realizing that goal.

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Why IT Leaders Should Make Cloud Training a Top Priority
John Edwards, Technology Journalist & Author,  4/14/2021
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Lessons I've Learned From My Career in Technology
Guest Commentary, Guest Commentary,  5/4/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll