Thanks to text mining, visualization software can now depict relationships between concepts found in documents and email.
Numbers communicate facts but lack meaning on their own. It takes textual, tabular, or graphic presentation to lend them context and the ability to tell a story. The same applies to non-numeric information particularly representations of concept-based and interpersonal relationships. With text mining's emergence, we can now systematically extract concepts from email and textual documents and exploit them to classify, analyze, and automatically process source materials. New classes of relationship-network visualization software provide the same graphic accessibility to this "unstructured" information as charts and dashboards provide for the numeric data that has, until recently, been the near-exclusive basis for enterprise decision-making.
By the Numbers
For numeric data, we gravitate toward graphics rather than data tables or narrative text because graphics are efficient. Graphics depict facts in abstract but directly accessible forms that emphasize relationships how values have changed, what contributing factors are most significant relocating contextual clutter to a backdrop. It's those relationships not the context-lacking absolutes that matter most in corporate decision-making.
Monitoring business processes, measuring performance, and constructing data warehouses, forecasts, and predictive models are worthless unless they impart actionable information. However, most corporate information is unstructured and not easily analyzed, much less depicted in basic line, bar, pie, and scatter charts depicting relative numeric values.
Organization charts are the ancestors of the relationship-network graphics that I'm describing. Org charts can be useful but they're not good at depicting data that's dynamic, high-volume, or doesn't fit a rigid hierarchy. Computer- and communications-network diagrams add the missing elements but were designed to map physical (rather than conceptual) networks. They do effectively use symbols and lines that are varied in type, orientation, size, and coloring to imply how the nodes and connections in the mapped networks may be classified. And they effectively use layout to depict dispersal or distribution of nodes in a space, even if what is mapped is the physical rather than the conceptual arrangement we seek to depict in relationship graphics.
Work by now-deceased artist Mark Lombardi was the first attempt I know of to graphically depict extensive, hierarchical, multidimensional relationship networks. His New York Times obituary explained:
Sometimes measuring as much as 10 feet across, these drawings nonetheless had tremendous visual verve, delicately tracing the convoluted unfoldings of contemporary morality tales like the savings and loan scandal, Whitewater, Iran-Contra and the Vatican bank scandal.
The small circles in his drawings identified the main players individuals, corporations, and governments along a time line. The arcing lines showed personal and professional links, conflicts of interest, malfeasance, and fraud.
Solid lines traced influence, dotted lines traced assets, and wavy lines traced frozen assets. Final denouements like court judgment, bankruptcy, and death were noted in red. (Roberta Smith, March 25, 2000)
New classes of network-visualization tools arose out of academic and industrial research in the 1990s. They automate the graphic rendering Lombardi did painstakingly by hand. I've seen these tools used mostly in connection to law enforcement and counterterrorism text mining but their use is spreading. For example, perhaps in the spirit of Lombardi's work, the Washington Post ran a couple of interesting relationship charts earlier this year, one depicting the business and government connections of controversial former Department of Defense official Richard Perle and the other illustrating President Bush's fundraising network. The print versions are of course static; the online version (see "Spheres of Influence" in Resources) adds interactivity that lets the user explore the diagram by focusing and zooming in on sections and nodes and viewing associated annotations.
The Post's relationship graphs, even if they support interactive exploration, remain static in the sense that there's no active database connection and no ability to alter the diagram layout, arrangement, focus, orientation, or other aspects of the graphs, but then the Post's aim is to disseminate information, not provide extensive online analytics. A variety of commercial tools from companies including Advizor Solutions, Inxight, Tom Sawyer Software, and others offer these capabilities. (Advizor's software is licensed by BI providers including Business Objects and Information Builders.)
Figure 1 displays a form of data constellation generated by Advizor's software, and Figure 2 is a social-networking implementation of the TouchGraph open-source visualization system. Visit the Web sites of the software providers listed in Resources for dozens of intriguing relationship-network representations.
FIGURE 1Data constellation from Advizor Solutions
FIGURE 2The Spoke system provides interactive social-network visualization via SpokeMap, based on the TouchGraph open-source system.
Extraction and Display
Relationship-network visualization is of course not an end in itself. Ramana Rao, CTO and cofounder of Inxight, which provides tools for both visualization and text mining, explained in an interview that "there are two halves to these problems: an extraction half and a display half.... The diagrams can be only as good as the data." The aim is to enable knowledge discovery in the data, and, in these cases, the visualization tools provide the best means of understanding how the inherent relationship networks are organized how nodes are distributed and clustered, what paths exist between nodes, and their costs (indicating the proximity of the internode relationships) and ultimately how they may be optimized and exploited.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.