Kaggle Winners Tapped As Data Analytics Consultants
Building on its crowdsourcing competitions, Kaggle adds a consulting option. But are brilliant data scientists necessarily gifted advisers?
Demand for analytics experts outstrips the current supply, but the market is responding. Enter Kaggle, which this week announced the Kaggle Connect program to make its top data scientists available through subscription-based consulting.
Three-year-old Kaggle made its name by hosting crowdsourced analytics competitions. Those competitions put businesses, government agencies and researchers in touch with a growing network of assorted data wonks: astronomers, hedge fund quants, statisticians, economists, mathematicians and others who thrive on a challenge, have time to spare and want a crack at prize money that typically runs in the tens of thousands of dollars.
Kaggle has held some 200 competitions that have drawn more than 80,000 participants. It statistically ranks all these competitors based on their performance in each competition, and it breaks them into nine tiers of performance and expertise. Kaggle will draw from its top two tiers of experts for the new Connect service.
The question is whether these gifted data scientists can be effective consultants, a role that often requires them to identify business opportunities, effectively communicate methods and priorities, and (most important) work well with customers.
Crowdsourced competitions have yielded stunning successes. In a competition sponsored by Allstate in 2011, top Kaggle contestants easily beat the performance of Allstate’s best baseline model for predicting which autos covered by policies would be involved in bodily injury claims. The winner's model was 270% more accurate than Allstate's baseline model, and the insurer has since incorporated key elements into the models it now uses.
Not every problem lends itself to a competition, however, particularly in cases where success doesn't depend on improving on a single metric. Kaggle developed Connect as a way for companies to subscribe to the services of experts in particular areas. Skills in natural language processing, time-series analysis and demand forecasting, for example, are in big demand. These skills have been the basis of a handful of engagements in recent months that served as the beta test for Kaggle Connect.
Kaggle is working with a large (but unidentified) consumer packaged goods company, for example, to build a highly accurate demand-forecasting model to improve supply chain efficiency and avoid out-of-stock situations.
"If you subscribe to this service, you can add a world-class expert on time-series forecasting to your team," Kaggle founder and CEO Anthony Goldbloom tells InformationWeek. What's more, customers can examine the track record of these experts in public competitions involving similar work.
The Connect program presents new competition to analytics consulting firms such as Mu Sigma and Accenture. Mu Sigma is an analytics specialist, with more than 2,500 experts (mostly in Bangalore) on staff. The company's intensive training program instructs employees not only on analytical methods, but also teaches them to listen, synthesize problems and communicate ideas. The most gifted Mu Sigma employees serve as direct, often in-country customer liaisons, communicating project requirements to more technical employees back in India.
Accenture is, of course, a large and well known systems integrator, and over the last five years it has built up an extensive analytics consulting practice. It's hard to generalize, but one could easily imagine Kaggle's talent pool as a far more eclectic group than Accenture's suit-and-tie legions.
According to Goldbloom, the Kaggle network breaks down into three groups: academicians looking to work with real-world data sets to test their theories; crack analytics experts who are employed but take on Kaggle work in their spare time to stay challenged (indeed, that's how Kaggle was discovered by Allstate); and top performers who have quit their jobs and are pursuing consulting work based on their Kaggle profiles. Goldbloom says this last group should thrive on the Connect program.
Potential Connect projects might include customer churn analysis and life sciences research. One of Kaggle's competitions might yield an accurate model to predict customer defections, but that wouldn't yield clear and detailed insight into the causes and contributing factors. Similarly in life sciences research, analysts are searching -- looking for biomarkers indicating the possibility of side effects to a drug, for example -- not improving on a single metric that can be measured in a competition.
Connect subscriptions are expected to range from $30,000 to $100,000 per month. Kaggle's expert (or experts) will either augment a customer's existing data science team or become that team if the company is new to analytics. Either way, engagements start with a single project to make sure customers are pleased before the 12-month subscription period begins. Participating experts get a percentage of the subscription fees, and Goldbloom said the service will initially involve only "extraordinary people" from its network.
Kaggle has added two technology platforms to support Connect. A Private Analysis Environment lets clients upload their data to a secure virtual machine that can be accessed only by the contracted data scientist. Kaggle Workbench provides a suite of tools to clean up data sets so analysts can spend their time developing algorithms, not parsing and manipulating data sets for consistency.
Goldbloom declined to identify Connect's beta customers, but marquis Kaggle competition customers (in addition to Allstate) include GE, Merck and Pfizer. In a typical competition, the sponsor presents a problem and data sets and then waits for the results. It's an arms-length relationship in which the winner gets the prize money and the sponsor gets the intellectual property.
An ongoing consulting relationship is a different animal. Does polish and experience across industries or with many clients really matter, or are customers interested only in the final performance numbers? This should be an interesting experiment.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.