There are a number of vendors vying to help businesses manage and analyze the 80-85 percent of all corporate data that's stored in unstructured and semi-structured formats -- and some of their names might surprise you.
Organizations use business intelligence applications primarily with structured data sources -- databases, data warehouses, and databased archives. These tend to be number-intensive data sources. But BI services firm the Atre Group estimates that 80-85 percent of data within organizations resides in semi-structured forms (text block fields, documents, attached notes, e-mails, reports, etc) or unstructured forms (paper and micrographic archives, raw files and backup disks or tapes, books, manuals and so forth). BI is starting to move into this unstructured data territory, through application of its analytic, statistical, and classifying technologies. BI is getting into the business of "information-making" by helping IT shops to organize and unlock semi-structured and unstructured data.
That inevitably means BI is also starting to tread into a lot of enterprise application integration (EAI) territory already occupied by some very well placed players, such as Google, IBM, Autonomy, Inxight and others. As BI becomes more deeply involved in the business of information-making -- enabling the searching, structuring, analysis and movement of data beyond familiar numeric constraints -- BI collides with vendors in the search, content management and information integration arenas.
But BI has been for a long time expanding its data and information nets to draw upon semi-structured data such as text sources, location, and mapping data. Perhaps the biggest new pool is the often unstructured data lying in otherwise structured databases -- in Binary Large Object (BLOB) fields such as e-mails, instant messages, and other commentary. Organizations are discovering that many valuable nuggets of unstructured data reside side-by-side with very refined structured information.
Structuring data is about making it accessible to decision-making. Data exists on a continuum from unstructured (no container for storage, retrieval, backup, and secured access) through semi-structured (stored, but not yet fully cleansed, sorted, interlinked and made searchable) to structured information (linked and correlated data that's attached to analytic models or processing APIs).
This is the process of information-making -- transforming data first into accessible and then actionable objects in the programming sense. In effect, data becomes information through three steps. First, it's made safely accessible. Next, it's correlated and interlinked in models that explain how the data is interrelated. Last, it's linked to dashboards and processes that allow decision-makers to observe and act on the information.
These models of behavior can remain fairly static, or may change over time. For example, inventory-processing works in a warehouse through a relatively static, stable model. In contrast, the ways stocks are valued in the market presents an ever-changing model. And of course, some portions of data may be applied to multiple models. Think of a databank with hundreds of economic data points that are used in many profoundly different explanatory and predictive models. So BI practitioners need not only set up and operate their decision-making tools, but also constantly refine their models and the underlying data brought into them.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.