The data science talent shortage has some companies thinking outside the box. Even if your company employs a formidable data science team, you would likely still benefit from third-party ideas or solutions. Data science competitions and other forms of crowdsourcing offer viable means of advancing the art of the possible relatively quickly and cost-effectively. We share some of the possibilities.
Data science competitions aren't new, but their communities are growing rapidly and the problems they're solving are changing over time. Generally speaking, data science competitions are being used for ideation and discovery, model and algorithm refinement, and for recruiting top talent.
The competitions are a good option for startups and SMEs that need access to specialized resources, but can't justify in-house resources. They're also popular among established companies that have formidable data science teams.
Data science community Kaggle and professional services firm Booz Allen Hamilton are currently conducting the second annual Data Science Bowl. The topic of last year's competition was ocean health. This year's topic is cardiac health.
"The level of engagement of the people participating is really impressive. They're on the forums talking about the data a lot, so lots of engagement around the problem, which is really exciting to see," said Steven Mills, chief data scientist at Booz Allen Hamilton, in an interview.
More organizations are attempting to leverage machine learning and AI in new ways, and they're using competitions to advance the state of the art. The competitions are attracting the attention of top researchers, data scientists, and individuals who want to develop new problem-solving skills.
"We're seeing a shift from machine learning and data science being done on text to more sophisticated kinds of data," said Kaggle cofounder and CEO Anthony Goldbloom, in an interview. "People are putting out image, text, and speech challenges because they know the problems can be solved."
Yelp sponsored a competition in cooperation with data science competition host DrivenData. The goal of the competition was to predict where restaurant health code violations would likely be found in a six-week period. The top modelers predicted what inspectors would find, which DrivenData compared to what the inspectors actually found. Using the winning algorithms, DrivenData and a Harvard researcher determined that the City of Boston could catch the same number of violations it currently did with 30%-50% fewer inspections.
"In this case, you have a handful of inspectors and a lot of restaurants, so you can target those inspections where they'll be most useful to the communities [the City of Boston] is trying to protect," said Greg Lipstein, cofounder of DrivenData, in an interview.
Brand-name companies are also using other crowdsourcing alternatives, such as Spare5. Spare5 is a micro-task platform that breaks Big Data problems into miniscule pieces and assigns them to iPhone app users who want to trade their expertise for a modest amount of cash. Its community members help clean data, tag images, and classify content. They also help improve search accuracy, conversions, and cross-selling, among other things.
"Machines can perform millions or billions of calculations in parallel, but a computer is only as useful as its ability to interact with people. To interact with people, computers need to understand us, and to understand us they need training data," said Matt Bencke, cofounder and CEO of Spare5, in an interview. "More big companies are trying to use machine learning and AI to take advantage of huge amounts of data, but the challenge is the scarcity of high-quality training data."
While competitions and other forms of crowdsourcing are growing in popularity, it isn't always obvious why a company should consider those options. Here are nine of the most compelling reasons.
Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.
Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
Top IT Trends to Watch in Financial ServicesIT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Join us for a roundup of the top stories on InformationWeek.com for the week of September 18, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."