Big Data // Big Data Analytics
News
12/30/2013
09:06 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%

How To Build A Successful Data Science Team

Don't try to find one superhuman who does it all. You need three experts: business analyst, machine learning expert, and data engineer, says Lithium Technologies chief scientist.

IBM Predicts Next 5 Life-Changing Tech Innovations
IBM Predicts Next 5 Life-Changing Tech Innovations
(click image for larger view)

Is there really a data scientist shortage, or are organizations simply trying too hard to recruit a unicorn, a jack-of-all-trades who possesses both advanced technical and business acumen? 

If the unicorn hypothesis is true, it would explain why the scarcity of data scientists is expected to worsen in the coming years.

The solution isn't difficult, some industry insiders believe, but rather one that might prove unpopular with cost-conscious organizations unable or unwilling to hire a data science team rather than a single data scientist.

Dr. Michael Wu is chief scientist of Lithium Technologies, a San Francisco-based company that sells social customer experience management software to businesses. Not surprisingly, Lithium captures a lot of data on consumer behavior, and part of Wu's job is to analyze that information and predict customer actions on an aggregate level.

[ Want more on the data scientist phenomenon? Read Data Scientist: The Sexiest Job No One Has. ] 

Wu believes term data scientist is tossed around loosely these days, so much so that it's creating a bit of confusion in the tech industry.

"What the industry calls a 'data scientist' now is really several different roles," said Wu in a phone interview with InformationWeek. "When people say there's a shortage of data scientists, (they mean) there is a shortage of people with all of these different skills."

Wu subdivides the data scientist role into three distinct jobs, each requiring a different skill set: business analyst, machine learning expert, and data engineer.

"You need these three groups of people to work together in order to inform the business decision-makers," said Wu.  

The role of business analyst existed long before the terms "big data" or "data scientist" were in vogue. This person works with front-end tools, meaning those closest to the organization's core business or function, such as Microsoft Excel, Tableau Software's visualization tools, or QlikTech's QlikView BI apps. A business analyst might also have sufficient programming skills to code up dashboards, and have some familiarity with SQL and NoSQL.

"They analyze business-level data and try to produce actionable insights," said Wu. "A lot of companies have (these) people."

The recent hype surrounding big data, however, has led many business analysts to rebrand themselves as data scientists even though they are not, according to Wu's definition.

"It automatically gives them a little boost in their salary," Wu said, chuckling.

The second data science role is that of machine-learning expert, a statistics-minded person who builds data models and makes sure the information they provide is accurate, easy to understand, and unbiased.

"These are the people who develop algorithms and crunch numbers," said Wu. "They are interested in building models that predict something."

A machine-learning expert, for instance, might develop algorithms that predict consumer sentiment or estimate a person's influence in a particular industry.

"There are even machine-learning algorithms that look at images and tag them automatically, or look at videos and try to understand what the video is about," said Wu. 

Like the business analyst, the machine-learning expert isn't a new profession, but rather one that's existed "in the last 30 years or so," Wu estimated.

The third key job, data engineer, is "the bottom layer, the foundation," said Wu. "They are the ones who play with Hadoop, MapReduce, HBase, Cassandra. These are people interested in capturing, storing, and processing this data… so that the algorithm people can build models and derive insights from it."

However, it's nearly impossible to find one person -- that data scientist unicorn -- who excels in each of these three areas, Wu said. And that's why organizations must focus instead on building a data science team.

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 2 / 2
Whoopty
50%
50%
Whoopty,
User Rank: Ninja
12/31/2013 | 6:33:40 AM
Good luck
Good luck to those that find themselves needing to convince higher ups that they need to take on three people to tackle one particular function!
<<   <   Page 2 / 2
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - September 17, 2014
It doesn't matter whether your e-commerce D-Day is Black Friday, tax day, or some random Thursday when a post goes viral. Your websites need to be ready.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.