Interop ITX: Machine Learning Newbies Need Realistic Goals

You need two things to get started with machine learning at your organization -- a realistic vision of what you want to do and a census of the material or data that you have to work with.

How do you get started with machine learning in your enterprise when you don't have a data scientist or an idea of where to begin? Breaking it down to two simple components, organizations should start with a vision of what they want to do and then look at the material or data they have to work with to get there.

That's according to Friederike Schüür, who offered her expertise on how to introduce machine learning to organizations during a presentation at the AI Summit at Interop ITX, May 1. It's the work that Schüür does every day at Cloudera Fast Forward Labs, the consulting firm acquired by data science company Cloudera in September 2017. Fast Forward Labs works with customer organizations to apply advanced techniques such as machine learning, deep learning, and natural language processing to real business problems inside companies.

{Image 2}

When Cloudera Fast Forward Labs begins an engagement with a customer company, the vision and the material or data are the two "bookends" of what Schüür says she needs to get started with doing this work.

The "vision" part of it often means working with the organization to set realistic expectations about what data, data science, and machine learning can mean for the customer organization. Clarifying those ideas is important to avoid a variety of pitfalls. Maybe the company is solving the wrong problem. Or they are solving the right problem but didn't realize it had already been solved. Or maybe they have been trying to solve the right problem with the wrong tools. Or, perhaps they are solving the right problem too slowly for it to matter. Or they are solving the right problem the wrong way, so that no one uses it. You get the idea. Understanding the right approach is important.

Companies need a goal that formulates firm measures of success, Schüür said.

Next, organizations must look at the material they have available to perform their work. They need to do a data census to look at just what's available for their project. Organizations will want to ask themselves some fundamental questions as part of this process. For instance, what is data? What data is recorded or could be recorded? How is the data recorded? How and where is the data stored? And is the data easily accessible?

One of the challenges organizations may encounter is that a lack of data puts boundaries on their machine learning dreams, Schüür said. It limits what data science and machine learning can do for your organization.

However, there's opportunity here as well. Access to unique data is a key differentiator in today's world. Schüür warned more than once during her presentation about carelessly sharing data beyond your organization. Your data has value and can provide your company with a competitive edge. You may be giving that away if you share it.

Schüür also warned that not all data science projects are destined to succeed, and that's the way it should be.

"Before you work with the data you don't know what is in the data," she said. "That means it's really a science. You don't know what you are going to build before you get started on a project."

And that necessarily means that some projects will be abandoned and not yield any concrete business value. But those projects will yield something else -- knowledge about the data. So maybe you end a project after two weeks.

"What you should emphasize at that point is not so much what you've built, but the knowledge that you've gained," she said.