9 Reasons To Crowdsource Data Science Projects - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
11:01 AM
Lisa Morgan
Lisa Morgan
Connect Directly

9 Reasons To Crowdsource Data Science Projects

The data science talent shortage has some companies thinking outside the box. Even if your company employs a formidable data science team, you would likely still benefit from third-party ideas or solutions. Data science competitions and other forms of crowdsourcing offer viable means of advancing the art of the possible relatively quickly and cost-effectively. We share some of the possibilities.
1 of 10

(Image: pmbbun via Pixabay

(Image: pmbbun via Pixabay

Data science competitions aren't new, but their communities are growing rapidly and the problems they're solving are changing over time. Generally speaking, data science competitions are being used for ideation and discovery, model and algorithm refinement, and for recruiting top talent.

The competitions are a good option for startups and SMEs that need access to specialized resources, but can't justify in-house resources. They're also popular among established companies that have formidable data science teams.

Data science community Kaggle and professional services firm Booz Allen Hamilton are currently conducting the second annual Data Science Bowl. The topic of last year's competition was ocean health. This year's topic is cardiac health.

[Before you quit your current job to go to a startup, find out if it's really a fit for you. Read 10 Signs You're Not Cut Out to Work at a Startup.]

"The level of engagement of the people participating is really impressive. They're on the forums talking about the data a lot, so lots of engagement around the problem, which is really exciting to see," said Steven Mills, chief data scientist at Booz Allen Hamilton, in an interview.

More organizations are attempting to leverage machine learning and AI in new ways, and they're using competitions to advance the state of the art. The competitions are attracting the attention of top researchers, data scientists, and individuals who want to develop new problem-solving skills.

"We're seeing a shift from machine learning and data science being done on text to more sophisticated kinds of data," said Kaggle cofounder and CEO Anthony Goldbloom, in an interview. "People are putting out image, text, and speech challenges because they know the problems can be solved."

Yelp sponsored a competition in cooperation with data science competition host DrivenData. The goal of the competition was to predict where restaurant health code violations would likely be found in a six-week period. The top modelers predicted what inspectors would find, which DrivenData compared to what the inspectors actually found. Using the winning algorithms, DrivenData and a Harvard researcher determined that the City of Boston could catch the same number of violations it currently did with 30%-50% fewer inspections.

"In this case, you have a handful of inspectors and a lot of restaurants, so you can target those inspections where they'll be most useful to the communities [the City of Boston] is trying to protect," said Greg Lipstein, cofounder of DrivenData, in an interview.

Brand-name companies are also using other crowdsourcing alternatives, such as Spare5. Spare5 is a micro-task platform that breaks Big Data problems into miniscule pieces and assigns them to iPhone app users who want to trade their expertise for a modest amount of cash. Its community members help clean data, tag images, and classify content. They also help improve search accuracy, conversions, and cross-selling, among other things.

"Machines can perform millions or billions of calculations in parallel, but a computer is only as useful as its ability to interact with people. To interact with people, computers need to understand us, and to understand us they need training data," said Matt Bencke, cofounder and CEO of Spare5, in an interview. "More big companies are trying to use machine learning and AI to take advantage of huge amounts of data, but the challenge is the scarcity of high-quality training data."

While competitions and other forms of crowdsourcing are growing in popularity, it isn't always obvious why a company should consider those options. Here are nine of the most compelling reasons.

Rising stars wanted. Are you an IT professional under age 30 who's making a major contribution to the field? Do you know someone who fits that description? Submit your entry now for InformationWeek's Pearl Award. Full details and a submission form can be found here.

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 10
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll