Big Data // Big Data Analytics
Commentary
1/3/2014
09:06 AM
Noah Iliinsky
Noah Iliinsky
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

Big Data Visualization: 3 Errors To Avoid

Avoid common visualization mistakes. Here's advice on how to clarify goals and get better results.

There has been a lot of talk about data visualization lately -- almost as much as there has been about big data. We're told that visualization is the best way (or the only way) to understand data, and that if we're not visualizing it, we're missing out.

Visualization is a great way to gain and share insight, but many big data teams are doing it the wrong way. How can it be done wrong? It turns out there are several ways to undermine data visualizations. Let's look at a few of the most common mistakes.

Error 1: Displaying all the data
Despite what you were told in school, most people don't care about seeing your work. They don't care about how much data you can process every day or how big your Hadoop cluster is. Customers and internal users want specific, relevant answers, and the sooner they can get those answers, the better. The closer you can come to giving them exactly what they want, the less effort they have to expend looking for answers. Any irrelevant data on the page makes finding the relevant information more difficult; irrelevant data (no matter how valid) is noise.

Noise is particularly prevalent in dashboards, where the guiding philosophy is often "Show the status of everything." But most performance measures are normal (and boring), not noteworthy. Showing all the normal conditions gives the abnormal measures a lot of places to hide.

[Want advice on buying visualization technology? Read How To Choose 'Advanced' Data Visualization Tools.]

A better dashboard approach is to show only what's interesting or important. Prioritize what matters, what's unexpected, and what's actionable, and deemphasize everything else. Deep dives into data can be important, but dashboards aren't the place for that. Broad overviews of non-actionable data are better handled as reports.

Error 2: Displaying the wrong data
This error is as dangerous as the first one. Showing subsets of information is fine, as long as the data relationships are relevant. If you care about sales, for example, you may also care about sales per region or sales over time. Consider how the data will be used to make decisions.

Showing several closely related graphs can be a nice compromise between showing too much in one graph and not showing enough overall. A few clean, clear graphs are usually better than a single complicated data visualization.

Error 3: Representing data poorly
Even when you're graphing the right data, you can still get it wrong. Most exotic graph types are seldom seen, because they don't work very well. The vast majority of visualization needs are well addressed with bar and line graphs, scatter plots, and (if done well) pie graphs.

Think about the key relationships among data fields, and consider putting those fields on the axes. Group by category, and then order the data by time or magnitude or importance. (Alphabetization is most useful when nothing else matters.) Use color for category, not magnitude; you can use brightness or saturation to illustrate magnitude. Use labels and other marks selectively to call attention without cluttering.

Good design: Think and plan first
The best way to avoid all these errors is to focus on your goals first. Before considering how your visualizations should look, think about the following questions, in this order.

  1. What actions to you need to enable (or what do we care about)?
  2. What decisions do you need to inform (and what are we going to do about it)?
  3. What questions do you need to ask?
  4. What data do you need to see?
  5. What is the best structure for revealing the important relationships in the data?
  6. What data do you need to highlight?

As you answer these questions, you can begin to design and implement the right visualizations using the right data. It's likely that you'll have to make changes. This is a good thing. Iterate, test, try different approaches, test some more, and iterate again. A deliberate, user-oriented design approach will yield effective, efficient, and useful data visualizations.

Noah Iliinsky is a visualization expert at IBM. He is coauthor of Designing Data Visualizations and technical editor of and a contributor to Beautiful Visualization, both published By O'Reilly Media.

These five higher education CIOs are driving critical changes in an industry ripe for digital disruption. Also in the Chiefs Of The Year issue of InformationWeek: Stop bragging about your Agile processes and make them better (free registration required).

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
davidallanvan
50%
50%
davidallanvan,
User Rank: Apprentice
1/8/2014 | 4:16:18 AM
Re: Keep-It-Simple Principle Rules
Tips are useful when they help people avoid making some common mistakes, and this article includes some good ones. But tips-based approaches take shortcuts that bypass a rock-solid understanding of fundamental principles. In Data Visualization, those principles are clearly and comprehensively explained by Edward Tufte in his book, The Visual Display of Quantitative Information. His grounding assumption is that statististicians lack an understanding of graphics, and graphic artists don't understand statistics. It is unlikely that one can present statistics well without understanding the fundamentals, just like it is unlikely that urban planners can be successful without understanding the work of William H. Whyte.
Laurianne
50%
50%
Laurianne,
User Rank: Author
1/3/2014 | 12:43:08 PM
Smart question to ask
I'm really glad the author made this point, asking: "What decisions do you need to inform (and what are we going to do about it)?"

Does the data help people make the decision at hand? And can your data experts grasp the nuances of the decision? This is another example of why having data science/analytics  experts from varying backgrounds proves valuable.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
1/3/2014 | 10:26:21 AM
Keep-It-Simple Principle Rules
I subscribe to the Stephen Few school in believing that good data visualizations are simple, clear visualizations. Noah has it right that showing two much information and using overly complicated formats is a rookie mistake. Avoid extraneous eye candy, like 3D effects and coloration without meaning. For more on effective visualization, check out our "Top 15 Data Visualization Tips."
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.