Big Data // Big Data Analytics
Commentary
5/28/2014
11:18 AM
Daniel Jebaraj
Daniel Jebaraj
Commentary
Connect Directly
Twitter
RSS
E-Mail
100%
0%

Data Scientists: Stop Searching, Start Grooming

Don't put your big-data project on hold -- members of your current staff may be the best fit for your data science team.

Data scientists, like fire-breathing dragons, may exist. But I submit that your organization does not need either of them to solve business challenges. Rather, data-related problems are best handled by your software developers in tandem with your business analysts.

Business analysts are closest to your business needs and good at communicating them to your software development teams. They can do the same for data processing and modeling. They just need to be acquainted with the basics of data processing and modeling -- skills that are not particularly hard to acquire. Your software development teams, meanwhile, need to develop core expertise in building machine-learning models.

Before you go out and hire machine-learning experts, consider this: Do you hire compiler-writers for your development teams? Probably not. You hire software developers with knowledge of high-level languages such as C#, Java, and Python. Globally, there are very few developers who both write compilers and design programming languages. Yet large teams of business developers with no compiler-development experience can take advantage of the systems produced by language authors and compiler-writers to deliver real business value.

[For more big-data hiring advice, see: Hadoop Jobs: How One Exec Vets Talent.]

Likewise, you almost never need to write your own machine-learning algorithms. These are researched for years and produced by folks with PhDs in fields such as statistics, mathematics, and machine learning (or maybe all three). You can instead use an excellent (and free) system such as R to build and refine machine learning models. All you need to know is how to build and interpret models. It is not hard to understand the basic intuition behind commonly used classes of algorithms, such as those used for classification, clustering, and regression. In fact, programming knowledge greatly helps with understanding machine-learning models.

With modeling, it's also possible to start small and still obtain good outcomes. Building the simplest models with tried-and-true algorithms, such as linear regression, with an accurate understanding of the assumptions involved can often produce excellent business results. Your teams do not have to start out using the more complex models, such as neural networks or support-vector machines, to get wins. They can instead use models such as linear regression, logistic regression, and decision trees. As your team gains understanding from deploying real-world models, its members will be ready to take on more advanced modeling challenges.

(Image: Josch13 on Pixabay)
(Image: Josch13 on Pixabay)

A basic understanding of statistics is also very helpful, since data models often have a statistical basis and related assumptions. Using the simplest models, coupled with a firm understanding of statistical assumptions, will often produce better results than simply tweaking knobs on a more complex model.

In some instances, models are built to predict outcomes based on data provided. Think of loan approvals based on income and other such factors. In other cases, models are used for their explanatory power. They make it easier to understand and possibly explain relationships between independent and dependent (outcome) variables. Think of studying the effects of gasoline prices on various economic indicators.

A decent understanding of statistics will help analysts comprehend the pitfalls of interpreting and using different models. Again, such an understanding of statistics is not difficult to acquire. We're not coming up with original research in this area, nor do we need an in-depth understanding of the mathematical foundations of each approach. Trained business analysts and software developers will have no trouble obtaining a basic level of statistical literacy.

Additionally, if you have big data-related challenges, your teams will need to master the MapReduce paradigm. You can use MapReduce in a programming language of your choice or better yet use a higher-level domain-specific language such as Pig Latin or Hive for this purpose. If you know SQL, you know Hive. If you have used LINQ, you can use Pig Latin. This is not a difficult proposition for software developers, and most will make the transition with ease.

Another key area that directly affects the outcomes of modeling activity is the processing of data prior to analysis. It's hard to get real-world data into a form where it can be used to your advantage. Fortunately, this is an area where most software development teams have expertise. Getting data processing right takes effort, but the fundamentals are not hard to master.

If you are thinking about building a data science team (and you should be), look to your development teams to work on data-related challenges. Then start reaping the rewards as your competition continues hunting desperately for data scientists.

Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators. Read our InformationWeek Elite 100 issue today.

As Vice President, Daniel Jebaraj leads Syncfusion's product development. He oversees overall product development and plans for specific releases. By actively engaging with customers, he ensures that each new product is improved, based on customer feedback. Previously, as ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
AmanM731
100%
0%
AmanM731,
User Rank: Apprentice
6/2/2014 | 4:19:35 AM
Big data for Software engineers
I would like to know where can I find useful resources from where a Softare Engineer can learn new techniques to solve big data problems in his company? We need to have a certified skill to be able to get projects based on big data. No one takes our word for it.
batye
50%
50%
batye,
User Rank: Ninja
6/2/2014 | 2:32:26 AM
Re: All IT should be "grooming"
good point, I trust we gonna see Paradigm shift... in our time... 
danjebaraj
50%
50%
danjebaraj,
User Rank: Apprentice
5/30/2014 | 1:37:15 AM
Re: Big data is not rocket science. Get started!
Thanks, Doug. It is my hope that more companies take advantage of ever increasing data with available talent.

Best regards,
Daniel
smartin230
50%
50%
smartin230,
User Rank: Apprentice
5/29/2014 | 9:24:58 AM
All IT should be "grooming"
This same idea should be used for all IT. There is no such thing as training in IT. You are hired on a contract for a project. There is outsourcing galor. Nothing wrong with that but these have come at the expense of experience . Experienced people are laid off in a downturn and never brought back, instead projects are outsourced. There is some adavantage to using Amazon for your server farm there is also a disadvantage.... You are just like everybody else. The company that trains and retains its people is going to develop the next big thing because they have synergy and long term vision.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
5/28/2014 | 1:17:42 PM
Big data is not rocket science. Get started!
I agree with Daniel's point that current staff can get started without waiting for data scientists to join the staff. Paytronix and Harte Hanks are among the firms I've recently spoken to that have built out big-data infrastructured including Hadoop clusters without hiring new, specialized talent. When you want to get into deep data science, yes, you may need more exportise if you don't have it already, but running data platforms is not rocket science.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - September 10, 2014
A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? When it comes to big data, one size doesn't fit all. Here's how to decide.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.