8 Ways You're Failing At Data Science
Data scientists and the Wizard of Oz have something in common: Few people really know what they do behind the curtain, which makes it hard to tell good from bad data science. These tips can help you discern the difference.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/blt10911b1972c89b95/64cb4844daed553d637b25c6/1-ball-958950_1280.jpg?width=700&auto=webp&quality=80&disable=upscale)
Data science would be easier to comprehend if there were a standard definition of it. True data science comprises several disciplines, including mathematics, statistics, machine learning, and computer science. A data science team must also understand how to curate and prepare data, analyze it, and present the results to business leaders in terms of potential business impact.
Many organizations are placing far greater emphasis on data than science, however. As a result, their outcomes may be falling short of expectations, and the reason for it may not be obvious.
Nevertheless, the search for the ultimate silver bullet continues. Companies are investing millions of dollars in platforms, solutions, and open source consulting resources hoping to get actionable insights that lead to competitive advantage. Doing data science right can take considerably more time and investment than may be apparent, however.
"It's really hard to get valuable, actionable insights out of data. You've got to build a team and use the scientific method," said Michael Walker, founder and president of the Data Science Association. "There are right ways and wrong ways to do it, and I think a lot of companies and governments are doing it the wrong way."
[ Having trouble making sense of disparate data? Read Data Visualizations: 11 Ways To Bring Analytics To Life. ]
Because the global demand for data scientists exceeds the number of qualified professionals, less qualified candidates are assuming the title. As a result, the data science practice in an organization may be less rigorous -- and ultimately less valuable -- than it would be if more qualified players were on the team.
"Data science is a formal methodology. You have a process. It's about having a hypothesis and testing it to see if the signals in your data really inform you of the things you think," said Kirk Borne, principal data scientist at Booz Allen Hamilton.
Testing a hypothesis sounds easy enough, but it's actually a lot more difficult and time consuming, and requires considerably more effort, than may be apparent to others in the organization. Here are a few things to consider if you want get more value from your data science efforts.
**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.
Sound data science requires a scientific approach. Without it, organizations run the risk of making decisions that are based on faulty assumptions, bad data quality, weak models, and erroneous analysis.
"You have to be careful to use scientific techniques to attempt to eliminate bias and to accurately measure. The first thing you're trying to do is formulate a question. You have a thesis and you're trying to measure things. One of the most important and difficult tasks is to pick the right data to answer a specific question. [Also] the data needs to be of a certain standard quality because if the data is false, you're going to end up with bad results," said Michael Walker, founder and president of the Data Science Association. "One of the highest uses of data science is to design experiments, posing the right question and collecting the right datasets, and doing it all up to scientific standards. Then you gather the results and interpret it."
Effective business leadership requires goal setting. Goals help organizations measure progress, and they provide a target with which people, processes, and technologies can be (and should be) aligned. Nevertheless, some organizations are amassing as much data as possible without a goal in mind, hoping it may become valuable in the future. Not surprisingly, the ROI can fall short of expectations.
"I see a lot of data scientists collecting everything without thinking about the business their client is in, the data they have now, and the data they're going to need to do a better job. The [organizations] that figure that out are going to have a huge advantage, but I think we're a ways from that," said Michael Walker, founder and president of the Data Science Association.
Not all data is created equal. Quality matters and so does the goal. Contrary to the hype, more data isn't necessarily better, it may be just more data. The danger of amassing data for data's sake is adding noise and risk rather than business value."
According to Kirk Borne, principal data scientist at Booz Allen Hamilton, "People forget that there really are ethical issues about the use of data, protections of data, and even statistical concerns such as [thinking] correlation is causation. People forget that if you crunch data long enough, it will say anything. If you have a very large collection you're going to find correlations. People think now if they have big data they can believe anything they see."
Unicorns are in high demand, but few of them actually exist. Unicorns are superhuman data scientists who are exceptionally well-rounded experts. In reality, individual data scientists tend to have specialized knowledge in some areas and considerably less knowledge in other areas, like anyone else.
According to Kirk Borne, principal data scientist at Booz Allen Hamilton, there are approximately 10 different skill areas involved in data science, and a bona fide data scientist may be able to do a third of those things. The data science skill areas Borne listed include: computational skills, statistics, databases, data management, making data available and searchable by organizing data in ways that enable scientific discovery, machine learning, semantics, Hadoop, data visualizations, and machine learning.
"You have to remind people you can't do all those things," said Borne.
Data science is a team sport. It's more realistic to establish a critical mass of skills by putting together the right team, rather than expecting a single data scientist to wear all hats equally well. "A lot of [executives] think 'We'll just hire this genius data scientist who's a unicorn, who's good at computer programming, knows probability theory, who can design experiments, who knows statistics and math equations.' C'mon. I know one person or maybe two people that can do that. You have to build a team with the right players on it," said Michael Walker, founder and president of the Data Science Association.
Resumes are evolving to meet the demand for data scientists, but organizations are finding out the hard way that there are data scientists and there are "data scientists." Similarly, some universities and organizations are simply rebranding their educational programs or teams without significantly changing the curriculum.
"People are assuming the [data scientist] title because it's a hot term and you get a lot more hits on LinkedIn. That creates a lot of confusion in the marketplace," said Jennifer Priestley, director of Kennesaw State University's Ph.D. program in analytics and data science. "Academia is equally irresponsible when it comes to this issue. You have a lot of programs that yesterday were operations research and today they're data science, or you had an MBA and now you have an MS in analytics or data science, but it's the same curriculum."
Rebranding is dangerous in business when expectations are the only things that change. "There are a million traps you can fall into when interpreting data. A lot of business leaders think they can simply [rename] a data analytics team a data science team. They find it's not improving decision-making and, in fact, they're making worse decisions and they wonder what's going on," said Michael Walker, founder and president of the Data Science Association.
Sometimes, there is no single, right answer to a particular problem, so informed choices must be made from among the possibilities. Data scientists are used to that kind of ambiguity. Others in the organization may prefer more certainty. For example, the average business professional wants more than actionable insights. She wants to know what action to take and, if she takes that action, what the outcome will be. Most answers to questions do not result in single answers that are absolutely certain. There is usually more than one possibility, each of which has a certain degree of uncertainty.
"Two of the most important things in data science really have to do with probability theory and scenario planning," said Michael Walker, founder and president of the Data Science Association. "Rather than saying if you do X you will definitely achieve your goal, we use probability theory and scenario planning so that your decisions are going to be more right than wrong."
If you're hoping data science will simply fall out of fashion, you may find yourself painfully behind your competitors. It's easy to become jaded by the endless stream of hype underscoring the need for technological and operational change. Still, the ways we use data in business have been permanently transformed.
"The data we work with today is fundamentally different than the data we've worked with in the past. We're not going to go back to the days of data coming to us in Excel spreadsheets with frozen columns, three variables, and hundreds of observations," said Jennifer Priestley, director of Kennesaw State University's Ph.D. program in analytics and data science. "Technology enables us to treat audio, video, and text in the same way we have treated numbers like age and income, historically. This is a sea change in the way businesses operate."
Data scientists, data analysts, business analysts, and technologists need to update their skills to remain effective. Similarly, as the use of data becomes necessary for more roles in an organization, the people who hold those jobs must be (or become) data literate. Knowing that, universities are adding data science courses for undergraduate and graduate students. Some are also considering the possibility of making data science part of the core curriculum, regardless of a student's major.
"Understanding, working with, translating, and being comfortable with data is no longer the responsibility of a narrowly defined department, because data increasingly drives all aspects of the business," said Jennifer Priestley, of Kennesaw State University. "Everyone needs to have some basic knowledge of data science."
People sometimes forget that models approximate reality. Organizations may have a team of people building models and sharing the results with business decision makers who rely on the information to make important decisions. But because the model differs from reality, unexpected outcomes can result, including retail overstocks, security breaches, and equipment failures.
"Models are useful for understanding complex phenomena, but problems can arise when you get into complex areas. You have to build in a lot of assumptions, and sometimes those assumptions have no basis in reality, so the model says 'do X, Y, Z' and it fails," said Michael Walker of the Data Science Association. "This is the difference between a data analyst that has not been scientifically trained and a data scientist. The data scientist is going to look at that and tell the client 'we've done these models and these are the assumptions we made.'"
People sometimes forget that models approximate reality. Organizations may have a team of people building models and sharing the results with business decision makers who rely on the information to make important decisions. But because the model differs from reality, unexpected outcomes can result, including retail overstocks, security breaches, and equipment failures.
"Models are useful for understanding complex phenomena, but problems can arise when you get into complex areas. You have to build in a lot of assumptions, and sometimes those assumptions have no basis in reality, so the model says 'do X, Y, Z' and it fails," said Michael Walker of the Data Science Association. "This is the difference between a data analyst that has not been scientifically trained and a data scientist. The data scientist is going to look at that and tell the client 'we've done these models and these are the assumptions we made.'"
-
About the Author(s)
You May Also Like