Concerned about big data's potential for misuse, a new organization of data scientists has created a set of rules for their profession.
The Data Science Association is a non-profit organization with more than 500 members, even though it's only about 2 months old. Based in Denver, Colorado, its mission is to define the term "data science" and the duties and ethical responsibilities of the people who call themselves data scientists, according to founder and president Michael Walker.
"Things were really getting out of control in terms of the definition of 'data science,'" said Walker in a phone interview with InformationWeek. "A lot of people who really weren't data scientists started calling themselves data scientists. And I saw a lot of data science malpractice in the companies, or clients, that we work with."
[ Looking to fast-track a career in data science? Read Become A Data Scientist … In 12 Weeks? ]
Walker is a managing partner at Rose Business Technologies, a Denver-based provider of data management and IT services. He hopes the Data Science Association can help protect the integrity of his chosen profession, which he said is rife with under-qualified practitioners and big data vendors making false promises.
"What's in short supply are trained data scientists to analyze and interpret and get that actionable, valuable intelligence [from data]," Walker said.
In addition, many data science products don't deliver the insights they promise. "A lot of vendors are making outlandish claims," said Walker. "Spend hundreds of thousands, or millions, of dollars on our new technology, feed it with data, push a couple of buttons and -- voilà! -- you're going to get predictive analytics and a competitive advantage."
The din is growing louder, he added, as more big data tools hit the market.
"I can tell you, it's just malarkey. It doesn't work that way," said Walker. "It's actually very difficult to analyze data, especially large data sets, and use the scientific method in the right way to get valuable, actionable intelligence to help your company, or to help [government] policymakers make better policy."
The Data Science Code of Professional Conduct covers both common-sense business practices and ethical guidelines, including a variety of rules that may prove challenging for data scientists -- and the companies and governments that hire them -- to follow.
For instance, the Code of Professional Conduct states that if a data scientist "reasonably believes a client is misusing data science to communicate a false reality or promote an illusion of understanding, the data scientist shall take reasonable remedial measures, including disclosure to the client, and including, if necessary, disclosure to the proper authorities. The data scientist shall take reasonable measures to persuade the client to use data science appropriately."
The above passage addresses the common problem of confirmation bias, where you only include data that confirms a particular position, and you ignore evidence of a contradictory position, Walker explained. "Say you have a data analytics team in a company, and your boss says, 'We need to be able to achieve XYZ goals,'" he added. "Or a policymaker might say, 'This is policy that we favor, go out and find the evidence to support it.'"
These pressures, not surprisingly, often lead to very bad data science, a problem the Data Science Association hopes to combat.
"We educate people on these issues, and [data science] becomes a profession that follows a code of conduct," said Walker. "All of us can band together and tell an employer or client, 'No, we cannot do that. I'm not going to find evidence to support something you want to do, unless [the supporting data] is really there.'"
Without a professional code of ethics for data science, businesses and governments can easily exploit data scientists. Said Walker, "If you don't do what they want you to do, they'll fire you and get someone else."
The big data market is not just about technologies and platforms -- it's about creating new opportunities and solving problems. The Big Data Conference provides three days of comprehensive content for business and technology professionals seeking to capitalize on the boom in data volume, variety and velocity. The Big Data Conference happens in Chicago, Oct. 22-23.