Big Data. Big Decisions
InformationWeek
Special Coverage Series

Commentary

Doug Henschen

Doug Henschen

Executive Editor, InformationWeek

Kaggle Winners Tapped As Data Analytics Consultants

Building on its crowdsourcing competitions, Kaggle adds a consulting option. But are brilliant data scientists necessarily gifted advisers?

Demand for analytics experts outstrips the current supply, but the market is responding. Enter Kaggle, which this week announced the Kaggle Connect program to make its top data scientists available through subscription-based consulting.

Three-year-old Kaggle made its name by hosting crowdsourced analytics competitions. Those competitions put businesses, government agencies and researchers in touch with a growing network of assorted data wonks: astronomers, hedge fund quants, statisticians, economists, mathematicians and others who thrive on a challenge, have time to spare and want a crack at prize money that typically runs in the tens of thousands of dollars.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Kaggle has held some 200 competitions that have drawn more than 80,000 participants. It statistically ranks all these competitors based on their performance in each competition, and it breaks them into nine tiers of performance and expertise. Kaggle will draw from its top two tiers of experts for the new Connect service.

The question is whether these gifted data scientists can be effective consultants, a role that often requires them to identify business opportunities, effectively communicate methods and priorities, and (most important) work well with customers.

[ Want more on analytics best practices? Read 4 Analytics Lessons From Professional Sports. ]

Crowdsourced competitions have yielded stunning successes. In a competition sponsored by Allstate in 2011, top Kaggle contestants easily beat the performance of Allstate’s best baseline model for predicting which autos covered by policies would be involved in bodily injury claims. The winner's model was 270% more accurate than Allstate's baseline model, and the insurer has since incorporated key elements into the models it now uses.

Not every problem lends itself to a competition, however, particularly in cases where success doesn't depend on improving on a single metric. Kaggle developed Connect as a way for companies to subscribe to the services of experts in particular areas. Skills in natural language processing, time-series analysis and demand forecasting, for example, are in big demand. These skills have been the basis of a handful of engagements in recent months that served as the beta test for Kaggle Connect.

Kaggle is working with a large (but unidentified) consumer packaged goods company, for example, to build a highly accurate demand-forecasting model to improve supply chain efficiency and avoid out-of-stock situations.

"If you subscribe to this service, you can add a world-class expert on time-series forecasting to your team," Kaggle founder and CEO Anthony Goldbloom tells InformationWeek. What's more, customers can examine the track record of these experts in public competitions involving similar work.

The Connect program presents new competition to analytics consulting firms such as Mu Sigma and Accenture. Mu Sigma is an analytics specialist, with more than 2,500 experts (mostly in Bangalore) on staff. The company's intensive training program instructs employees not only on analytical methods, but also teaches them to listen, synthesize problems and communicate ideas. The most gifted Mu Sigma employees serve as direct, often in-country customer liaisons, communicating project requirements to more technical employees back in India.

Accenture is, of course, a large and well known systems integrator, and over the last five years it has built up an extensive analytics consulting practice. It's hard to generalize, but one could easily imagine Kaggle's talent pool as a far more eclectic group than Accenture's suit-and-tie legions.

According to Goldbloom, the Kaggle network breaks down into three groups: academicians looking to work with real-world data sets to test their theories; crack analytics experts who are employed but take on Kaggle work in their spare time to stay challenged (indeed, that's how Kaggle was discovered by Allstate); and top performers who have quit their jobs and are pursuing consulting work based on their Kaggle profiles. Goldbloom says this last group should thrive on the Connect program.

Potential Connect projects might include customer churn analysis and life sciences research. One of Kaggle's competitions might yield an accurate model to predict customer defections, but that wouldn't yield clear and detailed insight into the causes and contributing factors. Similarly in life sciences research, analysts are searching -- looking for biomarkers indicating the possibility of side effects to a drug, for example -- not improving on a single metric that can be measured in a competition.

Connect subscriptions are expected to range from $30,000 to $100,000 per month. Kaggle's expert (or experts) will either augment a customer's existing data science team or become that team if the company is new to analytics. Either way, engagements start with a single project to make sure customers are pleased before the 12-month subscription period begins. Participating experts get a percentage of the subscription fees, and Goldbloom said the service will initially involve only "extraordinary people" from its network.

Kaggle has added two technology platforms to support Connect. A Private Analysis Environment lets clients upload their data to a secure virtual machine that can be accessed only by the contracted data scientist. Kaggle Workbench provides a suite of tools to clean up data sets so analysts can spend their time developing algorithms, not parsing and manipulating data sets for consistency.

Goldbloom declined to identify Connect's beta customers, but marquis Kaggle competition customers (in addition to Allstate) include GE, Merck and Pfizer. In a typical competition, the sponsor presents a problem and data sets and then waits for the results. It's an arms-length relationship in which the winner gets the prize money and the sponsor gets the intellectual property.

An ongoing consulting relationship is a different animal. Does polish and experience across industries or with many clients really matter, or are customers interested only in the final performance numbers? This should be an interesting experiment.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.