Strategic CIO // Executive Insights & Innovation
News
4/2/2014
00:00 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Merck Optimizes Manufacturing With Big Data Analytics

Pharmaceutical firm uses Hadoop to crunch huge amounts of data so it can develop vaccines faster. One of eight profiles of InformationWeek Elite 100 Business Innovation Award winners.

Producing pharmaceuticals of any kind is an expensive, highly regulated endeavor, but producing vaccines is particularly challenging.

Vaccines often contain attenuated viruses, meaning they're altered so they give you immunity but not the actual disease, and thus they have to be handled under precise conditions during every step in the manufacturing process. Components might have to be stored at exactly -8 degrees for a year or more, and with even a slight variance from regulator-approved manufacturing processes, the materials have to be discarded.

"It might take three parts to get one part, and what we drop or discard amounts to hundreds of millions of dollars in lost revenue," says George Llado, VP of information technology at Merck & Co.

[For more InformationWeek Elite 100 coverage and a complete listing of the top 100 companies, click here.]

In the summer of 2012, Llado was seeing higher-than-usual discard rates on certain vaccines. Llado's team was looking into the causes of the low vaccine yield rates, but the usual investigative approach involved time-consuming spreadsheet-based analyses of data collected throughout the manufacturing process. Sources include process-historian systems on the shop floor that tag and track each batch. Maintenance systems detail plant equipment service dates and calibration settings. Building-management systems capture air pressure, temperature, and other readings in multiple locations at each plant, sampling by the minute.

Aligning all this data from disparate systems and spotting abnormalities took months using the spreadsheet-based approach, and storage and memory limits meant researchers could only look at a batch or two at a time. Jerry Megaro, Merck's director of manufacturing advanced analytics and innovation, was determined to find a better way.

Llado (left) and Megaro used cloud-based Hadoop computing to speed up the analysis  of vaccine yield rates.
Llado (left) and Megaro used cloud-based Hadoop computing to speed up the analysis of vaccine yield rates.

By early 2013, a Merck team was experimenting with a massively scalable distributed relational database. But when Llado and Megaro learned that Merck Research Laboratories (MRL) could provide their team with cloud-based Hadoop compute, they decided to change course.

Built on a Hortonworks Hadoop distribution running on Amazon Web Services, MRL's Merck Data Science Platform turned out to be a better fit for the analysis because Hadoop supports a schema-on-read approach. As a result, data from 16 disparate sources could be used in analysis without having to be transformed with time-consuming and expensive ETL processes to conform to a rigid, predefined relational database schema.

"We took all of our data on one vaccine, whether from the labs or the process historians or the environmental systems, and just dropped it into a data lake," says Llado.

Megaro's team was then able to come up with conclusive answers about production yield variance within just three months. In the first month, July 2013, the team loaded the data onto a partition of the cloud-based platform, and it used MapReduce, Hive, and advanced dynamic time-warping techniques to aggregate and align the data sets around common metadata dimensions such as batch IDs, plant equipment IDs, and time stamps.

In the second month, analysts used R-based analytics to chart and cluster every batch of the vaccine ever made on a heat map. Spotting notable patterns, the team then used R to produce investigative histograms and scatter plots, and it drilled down with Hive to explore hypotheses about the factors tied to low-yield production runs. Using an Agile development approach, the team set up daily data-exploration goals, but it could change course by that afternoon if it failed to find solid data backing up a particular hypothesis. In the third month, the team developed models, testing against the trove of historical data to prove and disprove leading theories about yield factors.

Through 15 billion calculations and more than 5.5 million batch-to-batch comparisons, Merck discovered that certain characteristics in the fermentation phase of vaccine production were closely tied to yield in a final purification step. "That was pretty powerful, and we came up with a model that demonstrated, quantifiably, that specific fermentation performance traits are very important to yield," says Megaro.

The good news is that these fermentation traits can be controlled, but Merck has to prove that in a test lab before IT can introduce any changes to its production environment. And if any process changes are deemed material, Merck will have to refile the vaccine's manufacturing process with regulatory agencies. 

With the case all but solved for one vaccine, Merck is applying the lessons learned to a variant of that product that is expected to be approved for sale as soon as this year. And drawing on both the manufacturing insights and the new big data analysis approach, Merck intends to optimize the production of other vaccines now in development. They're all potentially lifesaving products, according to Merck, and it's clear that the new data analysis approach marks a huge advance in ensuring efficient manufacturing and a more plentiful supply.

Trying to meet today's business technology needs with yesterday's IT organizational structure is like driving a Model T at the Indy 500. Time for a reset. Read our Transformative CIOs Organize For Success report today. (Free registration required.)

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
4/4/2014 | 10:09:47 AM
Re: Deep industry knowledge required
I learned a couple of new tidbits from George and Jerry at the InformationWeek Conference that were not discussed in my earlier interviews. First, they discovered that plant equipment maintenace was one factor that had a big impact on vaccine yeilds. Apparently the bid data analysis revealed that yields declined significanlty after new power supplies were installed on some plant equipment. Turns out the new power supplies didn't meet the same specs as the equipment replaced, so this is something they were able to fix.

The second tidbit I heard is that Merck is now working on using high-scale demand data to analyze product demand. That's not something drug companies really have a handle on for the most part, but working with distributors they're hoping to sense demand signals so they can do a better job of planning production to meet market need.
ChrisMurphy
50%
50%
ChrisMurphy,
User Rank: Author
4/3/2014 | 5:02:01 PM
Deep industry knowledge required
I spent time talking with some of the Merck team at the InformationWeek Conference this week, and one thing that struck me is how for this kind of work it is essential to have deep industry knowledge combined with big data statistical knowledge. I'm talking about the big data analysis team really understanding the complex fermentation processes involved in making vaccines.
The Business of Going Digital
The Business of Going Digital
Digital business isn't about changing code; it's about changing what legacy sales, distribution, customer service, and product groups do in the new digital age. It's about bringing big data analytics, mobile, social, marketing automation, cloud computing, and the app economy together to launch new products and services. We're seeing new titles in this digital revolution, new responsibilities, new business models, and major shifts in technology spending.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.