How a Wildlife AI Platform Solved its Data Challenge - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management
News
6/29/2021
08:00 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

How a Wildlife AI Platform Solved its Data Challenge

Wild Me is a non-profit machine learning service provider for field biologists studying wildlife and conservation. But before you can create whale shark algorithms, you need good data.

Credit: Wild Me
Credit: Wild Me

Anyone working in data management and data science can attest to the challenge and time-consuming nature of mapping a set of data from a new source into a platform where it can be cleaned, validated, and ultimately analyzed and used to train algorithms. After all, your algorithms are only as good as the data used to train them.

Now imagine if these data sets are coming from hundreds of external users who have employed any number of systems to collect this data, from Excel files to actual shoeboxes full of photos. That is the challenge that non-profit wildlife conservation machine learning and artificial intelligence service provider Wild Me has faced over its more than a decade of operation. The organization builds open software and AI for the conservation research community. The organization is made up of technologists -- software and machine learning pros -- and it is designed to be the "trusted engineering powerhouse for wildlife biologists across the globe."

This AI software enables researchers to track individuals among different species -- whale sharks for example -- identifying them by unique patterns of spots. Wild Me created this initial use case algorithm and technology through a modification of a Hubble space telescope algorithm that looked at the pattern of stars in the night sky, according to Jason Holmberg, the organization's executive director, co-founder, and director of engineering.

Jason Holmberg
Credit: via Wild Me
Jason Holmberg Credit: via Wild Me

During a scuba trip in Djibouti in 2002, he saw his first whale shark and learned how researchers physically tagged and tracked the animals. He thought there might be a better way, through computer vision algorithms that could identify individuals by their unique spot patterns. This work turned into Whaleshark.org, a library of encounters and individual whale sharks used and maintained by marine biologists.

But that was just the first use case. From there Wild Me expanded as a platform for other animal researchers, allowing them to upload their data to catalog a series of other species from manta rays to giraffes to sea dragons. The platform serves more than 200 organizations and nearly 1,000 researchers tracking nearly 90,000 animals around the world with close to 444,000 sightings in its database.

The challenge of moving biologists' catalogs of encounters and sightings and individuals into the Wild Me platforms has been a thorny problem from the start.

"It's been an evolving process," said Holmberg. "When we first started working with biologists across the globe, we would write custom importers for every piece of data. That custom one-off code would take weeks."

Ben Scheiner, a Wild Me senior software engineer describes it this way: "We had our own hand-rolled JavaScript framework for doing data imports. But it was buggy. We are focused on ecological problems, and AI and machine learning is our key service. Understanding this data onboarding deserves its own company and suite of solutions. That's something we were unable to do on a non-profit bank account."

Ben Scheiner
Credit: via Wild Me
Ben Scheiner Credit: via Wild Me

There were no universal standards for how individual researchers cataloged their data. Each researcher created their own system.

Because of this, the idea of a "universal data importer is sort of farcical," Holmberg said. "But we were able to solve half the problem." Wild Me started using a tool to let field biologists begin mapping their data to a common set of fields and descriptors. These biologists could review the data in the system and then approve it.

While this streamlined the process and made it faster, there were still issues that could be improved. The system wasn't all that scalable, and it didn't let the researchers validate their own data. Wild Me began piloting a tool from a company called Flatfile, designed to solve the issues of processing and validating external data from multiple sources.

David Boskovic founded the Flatfile after working at a few different SaaS companies and running into the same annoying problem each time: how to get new customers' data into the system when each customer had used different systems.

"It has been a universal problem. The cost and effort of bringing data in is one of the costs of innovation," Boskovic said. But it was very frustrating. "I like to say I rage-designed this product."

David Boskovic
Credit: via Flatfile
David Boskovic Credit: via Flatfile

The other aspect of bringing data into a system is that your customers need to maintain ownership and control of that data. That's important for marketers. It's also important for field biologists. It's one of the reasons why Wild Me pursued the pilot with Flatfile.

"It's an intuitive system whereby a field biologist can maintain ownership of their data through the process of importing it into our system, and it will do things that we didn't currently have like data validation," Holmberg said. For instance, it will help "make sure all the GPS coordinates are in the right format. These are human-curated data catalogs. They do have errors."

During the validation process anomalies are presented back to the biologists who curated the data so that they can go back and clean up the data. This lets biologists see their data in one of the Wild Me platforms and work with that data in the platform.

The platforms are changing biologists' knowledge of the species they study.

"When I first started on whale shark research, everyone thought the Indian ocean was the big spot for that," Holmberg said. "As we built these online platforms, we could identify the movement of individuals...We now see that the Gulf of Mexico as one of the biggest hotspots for studying whale shark behavior."

In many cases, Wild Me is a researcher's first experience with cloud computing and storage and analysis for their data, so the goal is to make the system easy to use for people whose primary job is not technology.

Holmberg said that the data processing needs to be fast so that biologists can react to population changes with better conservation policy and strategies.

"Maybe that means to put up a fence, or take down a fence, or allow fishing, or ban fishing, depending on how variables impact population numbers," he said. "The faster we can estimate population numbers, the faster we can respond to changes and make sure our conservation strategies are iterating towards evermore successful solutions that help increase population numbers, especially for threatened and endangered animals."

What to Read Next: 

From AI to Teamwork: 7 Key Skills for Data Scientists
Machine Learning Basics Everyone Should Know
How to Recruit AI Talent and Keep Them Happy
Becoming a Self-Taught Cybersecurity Pro

 

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

News
Remote Work Tops SF, NYC for Most High-Paying Job Openings
Jessica Davis, Senior Editor, Enterprise Apps,  7/20/2021
Slideshows
Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Commentary
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Video
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Slideshows
Flash Poll