Don't try to find one superhuman who does it all. You need three experts: business analyst, machine learning expert, and data engineer, says Lithium Technologies chief scientist.
IBM Predicts Next 5 Life-Changing Tech Innovations
(click image for larger view)
Is there really a data scientist shortage, or are organizations simply trying too hard to recruit a unicorn, a jack-of-all-trades who possesses both advanced technical and business acumen?
If the unicorn hypothesis is true, it would explain why the scarcity of data scientists is expected to worsen in the coming years.
The solution isn't difficult, some industry insiders believe, but rather one that might prove unpopular with cost-conscious organizations unable or unwilling to hire a data science team rather than a single data scientist.
Dr. Michael Wu is chief scientist of Lithium Technologies, a San Francisco-based company that sells social customer experience management software to businesses. Not surprisingly, Lithium captures a lot of data on consumer behavior, and part of Wu's job is to analyze that information and predict customer actions on an aggregate level.
Wu believes term data scientist is tossed around loosely these days, so much so that it's creating a bit of confusion in the tech industry.
"What the industry calls a 'data scientist' now is really several different roles," said Wu in a phone interview with InformationWeek. "When people say there's a shortage of data scientists, (they mean) there is a shortage of people with all of these different skills."
Wu subdivides the data scientist role into three distinct jobs, each requiring a different skill set: business analyst, machine learning expert, and data engineer.
"You need these three groups of people to work together in order to inform the business decision-makers," said Wu.
The role of business analyst existed long before the terms "big data" or "data scientist" were in vogue. This person works with front-end tools, meaning those closest to the organization's core business or function, such as Microsoft Excel, Tableau Software's visualization tools, or QlikTech's QlikView BI apps. A business analyst might also have sufficient programming skills to code up dashboards, and have some familiarity with SQL and NoSQL.
"They analyze business-level data and try to produce actionable insights," said Wu. "A lot of companies have (these) people."
The recent hype surrounding big data, however, has led many business analysts to rebrand themselves as data scientists even though they are not, according to Wu's definition.
"It automatically gives them a little boost in their salary," Wu said, chuckling.
The second data science role is that of machine-learning expert, a statistics-minded person who builds data models and makes sure the information they provide is accurate, easy to understand, and unbiased.
"These are the people who develop algorithms and crunch numbers," said Wu. "They are interested in building models that predict something."
A machine-learning expert, for instance, might develop algorithms that predict consumer sentiment or estimate a person's influence in a particular industry.
"There are even machine-learning algorithms that look at images and tag them automatically, or look at videos and try to understand what the video is about," said Wu.
Like the business analyst, the machine-learning expert isn't a new profession, but rather one that's existed "in the last 30 years or so," Wu estimated.
The third key job, data engineer, is "the bottom layer, the foundation," said Wu. "They are the ones who play with Hadoop, MapReduce, HBase, Cassandra. These are people interested in capturing, storing, and processing this data… so that the algorithm people can build models and derive insights from it."
However, it's nearly impossible to find one person -- that data scientist unicorn -- who excels in each of these three areas, Wu said. And that's why organizations must focus instead on building a data science team.
Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek.
You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.