IoT
IoT
Data Management // Big Data Analytics
News
2/19/2016
11:01 AM
Lisa Morgan
Lisa Morgan
Slideshows
100%
0%

9 Reasons To Crowdsource Data Science Projects

The data science talent shortage has some companies thinking outside the box. Even if your company employs a formidable data science team, you would likely still benefit from third-party ideas or solutions. Data science competitions and other forms of crowdsourcing offer viable means of advancing the art of the possible relatively quickly and cost-effectively. We share some of the possibilities.
Previous
1 of 10
Next


(Image: pmbbun via Pixabay

(Image: pmbbun via Pixabay

Data science competitions aren't new, but their communities are growing rapidly and the problems they're solving are changing over time. Generally speaking, data science competitions are being used for ideation and discovery, model and algorithm refinement, and for recruiting top talent.

The competitions are a good option for startups and SMEs that need access to specialized resources, but can't justify in-house resources. They're also popular among established companies that have formidable data science teams.

Data science community Kaggle and professional services firm Booz Allen Hamilton are currently conducting the second annual Data Science Bowl. The topic of last year's competition was ocean health. This year's topic is cardiac health.

[Before you quit your current job to go to a startup, find out if it's really a fit for you. Read 10 Signs You're Not Cut Out to Work at a Startup.]

"The level of engagement of the people participating is really impressive. They're on the forums talking about the data a lot, so lots of engagement around the problem, which is really exciting to see," said Steven Mills, chief data scientist at Booz Allen Hamilton, in an interview.

More organizations are attempting to leverage machine learning and AI in new ways, and they're using competitions to advance the state of the art. The competitions are attracting the attention of top researchers, data scientists, and individuals who want to develop new problem-solving skills.

"We're seeing a shift from machine learning and data science being done on text to more sophisticated kinds of data," said Kaggle cofounder and CEO Anthony Goldbloom, in an interview. "People are putting out image, text, and speech challenges because they know the problems can be solved."

Yelp sponsored a competition in cooperation with data science competition host DrivenData. The goal of the competition was to predict where restaurant health code violations would likely be found in a six-week period. The top modelers predicted what inspectors would find, which DrivenData compared to what the inspectors actually found. Using the winning algorithms, DrivenData and a Harvard researcher determined that the City of Boston could catch the same number of violations it currently did with 30%-50% fewer inspections.

"In this case, you have a handful of inspectors and a lot of restaurants, so you can target those inspections where they'll be most useful to the communities [the City of Boston] is trying to protect," said Greg Lipstein, cofounder of DrivenData, in an interview.

Brand-name companies are also using other crowdsourcing alternatives, such as Spare5. Spare5 is a micro-task platform that breaks Big Data problems into miniscule pieces and assigns them to iPhone app users who want to trade their expertise for a modest amount of cash. Its community members help clean data, tag images, and classify content. They also help improve search accuracy, conversions, and cross-selling, among other things.

"Machines can perform millions or billions of calculations in parallel, but a computer is only as useful as its ability to interact with people. To interact with people, computers need to understand us, and to understand us they need training data," said Matt Bencke, cofounder and CEO of Spare5, in an interview. "More big companies are trying to use machine learning and AI to take advantage of huge amounts of data, but the challenge is the scarcity of high-quality training data."

While competitions and other forms of crowdsourcing are growing in popularity, it isn't always obvious why a company should consider those options. Here are nine of the most compelling reasons.

Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio

Previous
1 of 10
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.