Big Data // Big Data Analytics
News
12/30/2013
09:06 PM
Connect Directly
Google+
RSS
E-Mail
50%
50%
Repost This

How To Build A Successful Data Science Team

Don't try to find one superhuman who does it all. You need three experts: business analyst, machine learning expert, and data engineer, says Lithium Technologies chief scientist.

IBM Predicts Next 5 Life-Changing Tech Innovations
IBM Predicts Next 5 Life-Changing Tech Innovations
(click image for larger view)

Is there really a data scientist shortage, or are organizations simply trying too hard to recruit a unicorn, a jack-of-all-trades who possesses both advanced technical and business acumen? 

If the unicorn hypothesis is true, it would explain why the scarcity of data scientists is expected to worsen in the coming years.

The solution isn't difficult, some industry insiders believe, but rather one that might prove unpopular with cost-conscious organizations unable or unwilling to hire a data science team rather than a single data scientist.

Dr. Michael Wu is chief scientist of Lithium Technologies, a San Francisco-based company that sells social customer experience management software to businesses. Not surprisingly, Lithium captures a lot of data on consumer behavior, and part of Wu's job is to analyze that information and predict customer actions on an aggregate level.

[ Want more on the data scientist phenomenon? Read Data Scientist: The Sexiest Job No One Has. ] 

Wu believes term data scientist is tossed around loosely these days, so much so that it's creating a bit of confusion in the tech industry.

"What the industry calls a 'data scientist' now is really several different roles," said Wu in a phone interview with InformationWeek. "When people say there's a shortage of data scientists, (they mean) there is a shortage of people with all of these different skills."

Wu subdivides the data scientist role into three distinct jobs, each requiring a different skill set: business analyst, machine learning expert, and data engineer.

"You need these three groups of people to work together in order to inform the business decision-makers," said Wu.  

The role of business analyst existed long before the terms "big data" or "data scientist" were in vogue. This person works with front-end tools, meaning those closest to the organization's core business or function, such as Microsoft Excel, Tableau Software's visualization tools, or QlikTech's QlikView BI apps. A business analyst might also have sufficient programming skills to code up dashboards, and have some familiarity with SQL and NoSQL.

"They analyze business-level data and try to produce actionable insights," said Wu. "A lot of companies have (these) people."

The recent hype surrounding big data, however, has led many business analysts to rebrand themselves as data scientists even though they are not, according to Wu's definition.

"It automatically gives them a little boost in their salary," Wu said, chuckling.

The second data science role is that of machine-learning expert, a statistics-minded person who builds data models and makes sure the information they provide is accurate, easy to understand, and unbiased.

"These are the people who develop algorithms and crunch numbers," said Wu. "They are interested in building models that predict something."

A machine-learning expert, for instance, might develop algorithms that predict consumer sentiment or estimate a person's influence in a particular industry.

"There are even machine-learning algorithms that look at images and tag them automatically, or look at videos and try to understand what the video is about," said Wu. 

Like the business analyst, the machine-learning expert isn't a new profession, but rather one that's existed "in the last 30 years or so," Wu estimated.

The third key job, data engineer, is "the bottom layer, the foundation," said Wu. "They are the ones who play with Hadoop, MapReduce, HBase, Cassandra. These are people interested in capturing, storing, and processing this data… so that the algorithm people can build models and derive insights from it."

However, it's nearly impossible to find one person -- that data scientist unicorn -- who excels in each of these three areas, Wu said. And that's why organizations must focus instead on building a data science team.

Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
<<   <   Page 2 / 2
Whoopty
50%
50%
Whoopty,
User Rank: Ninja
12/31/2013 | 6:33:40 AM
Good luck
Good luck to those that find themselves needing to convince higher ups that they need to take on three people to tackle one particular function!
<<   <   Page 2 / 2
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.