Intel Chief Data Scientist Shares Secrets To Successful Projects - InformationWeek
IoT
IoT
Data Management // Big Data Analytics
News
11/16/2015
09:06 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%
RELATED EVENTS
Faster, More Effective Response With Threat Intelligence & Orchestration Playboo
Aug 31, 2017
Finding ways to increase speed, accuracy, and efficiency when responding to threats should be the ...Read More>>

Intel Chief Data Scientist Shares Secrets To Successful Projects

Before joining Intel, the company's first-ever Chief Data Scientist created models to predict the stock market, improve glaucoma diagnostics, and include unstructured data in healthcare applications. Here's what Bob Rogers is doing now at Intel and his advice for fellow and future data scientists.

6 Ways Big Data Is Driving Personalized Medicine Revolution
6 Ways Big Data Is Driving Personalized Medicine Revolution
(Click image for larger view and slideshow.)

What's a dream job for a data scientist? It just may be serving as the chief data scientist at a big technology vendor that doesn't care so much about selling analytics or big data solutions. Instead, it cares about empowering the analytics ecosystem and helping organizations apply big data analytics to any number of problems and challenges of business, science, and humanity.

That's where Intel's chief data scientist Bob Rogers is right now. He joined the chip giant in January and spends his time working both internally on Intel's own data science projects and externally as the "big data evangelist" for Intel, spreading the word about big data and helping organizations become successful with their analytics and data science projects.

Bob Rogers, Chief Data Scientist, Intel

(Image: via LinkedIn)

Bob Rogers, Chief Data Scientist, Intel

(Image: via LinkedIn)

"Intel sells chips. We don't sell services. We don't sell software. Our overall strategy is to empower the analytics ecosystem," he said. "I get to go around and help customers be successful without needing to sign on the dotted line or anything else. I help customers understand analytics and data problems and move the ball forward."

That's what he does externally. Among the internal projects Rogers works on with Intel are efforts to improve the IT help desk process by using semantic engines on text, adding unstructured data to improve the engine that provides successful best practices to Intel resellers, and serving as part of the Data Science Center of Excellence, a weekly and voluntary internal forum at Intel designed to enable data scientists within the company to help each other tackle big problems in their respective groups.

Rogers brings an eclectic range of training and experience to the role. He was trained as a physicist, and his PhD thesis was based on a computer simulation of what would happen to objects sucked into black holes. Over the course of his career, he worked on problems including the creation of simulations to predict the stock market for hedge funds, improving glaucoma diagnostics, and fixing doctor access to all relevant patient information. It was in that last role as cofounder of Apixio that he started working with Intel.

Apixio applies big data to problems with electronic health records (EHR).

"Because each EHR is a silo. What we discovered in our studies is that 65% of the information doctors should know about you is not in structured data." In the example of apparent heart failure, about 30% of cases are not actually heart failure. There are false negatives and false positives, and much of that information is in email, not EHRs. That means it's unstructured data.

A Different Approach

Now as chief data scientist at Intel, Rogers has a vantage point to see a wide array of challenges and possibilities enabled by big data across many different types of organizations. What is the biggest mistake that organizations make when implementing big data projects?

"One of the biggest problems I see is that enterprises want to build a big data stack, shove all their data in, and hope that insights bubble to the surface," he said. "But at the end of the day that's a good way to end up with an expensive project that doesn't seem to show any value."

Instead of this, he advocates more of an Agile approach.

"Start with a specific challenge and then build the minimum infrastructure needed," he said.

Intel itself has built up its own big data infrastructure internally alongside traditional business intelligence infrastructure, and those two systems talk to each other, according to Rogers.

In an initial project, Intel built a recommendation engine, by tapping into both structured and unstructured data, in order to help Intel offer resellers insights about how to be more successful, Rogers said. The company built up this big data stack on Hadoop, using Cloudera. The important component of this was adding the unstructured data.

"The structured data is the same data you've been looking at for years," he said. "But if you add even a small amount of the unstructured data, you get a huge step forward in performing and creating value. That's one of the big areas that I see data science advancing in very rapidly."

One of the other keys to success is crafting the right question. Rogers recently sat on a panel with a handful of other high-level data scientists at New York University, addressing the school's data science graduate students. They wanted to know about skills that are important to be successful in the field.

"There are technical skills -- math and statistics and modeling and computer science," he said. "Then there is understanding the business needs." Realistically, there aren't any people who embody all the skills of the perfect data scientist, he said. That's why it's important to be able to work collaboratively and draw on the skills of a team.

"We are looking for people with a mix of skills. What's really important is the ability to handle ambiguity. … [Y]ou may not have an exact answer that is analytically measurable, and that can put people outside their comfort zone.

"Another aspect is creativity," Rogers said. That means being able to look at a problem from multiple directions. For instance, instead of lumping all car buyers into one group, if you break them apart according to demographics and think about them separately, you will learn much more.

[Looking for more on Intel's big data initiatives? Read Intel's TAP Big Data Platform Gains Healthcare, Cloud Partners.]

"Those attributes are at least as important as the technical skills," Rogers said. "Data is messy."

In terms of tools and programming languages, the best tools for fledgling data scientists to start with and learn are R and Python, according to Rogers. The most exciting developments in terms of languages that are evolving for big data are functional languages like Scala that let you write complex analytics that immediately scale across huge clusters.

Looking Forward

Looking ahead to the future of big data and analytics, Rogers said he believes two of the most exciting areas offering big potential yet to be realized are the Internet of Things (IoT) and unstructured data.

In terms of IoT, Rogers said he believes there will be a great impact from having intelligence at the edge, for instance, in terms of wearables for healthcare applications and sensors coupled with analytics for smart cities and transportation.

While we have gotten good at analyzing text in terms of unstructured data, there is still much work to be done to understand images, video, and audio, Rogers said.

"Intel is working very hard in this area," Rogers said. "That's an exciting area of growth and advancement."

Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG's Infoworld, Ziff Davis Enterprise's eWeek and Channel Insider, and Penton Technology's MSPmentor. She's passionate about the practical use of business intelligence, ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
jagibbons
50%
50%
jagibbons,
User Rank: Ninja
11/18/2015 | 12:25:42 PM
Re: Data
They're growing that market, but I also like to see that they are contributing to the industry as a whole without it being part of a software product or services sale.
danielcawrey
50%
50%
danielcawrey,
User Rank: Ninja
11/17/2015 | 12:40:50 PM
Data
Interesting look at how Intel as a chipmaker looks at data and software. The company has a market cap of around $151 billion, most of that coming from chips.

But they also create software for their products, and they bought McAfee and turned it into Intel Security. The company is primarily hardware, but software and the data generated around it is growing too. 
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
IT Strategies to Conquer the Cloud
Chances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.
Video
Slideshows
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll