informa
/
2 min read
Commentary

From Text Analytics to Data Warehousing

IBM recently posted a quite nice page on extracting business value from "unstructured" data. The premise is that because much valuable business information originates in "unstructured" form, you need to look at text analytics as a technology that can unlock value. And naturally, if you already have a BI program and a data warehouse, you'll want to explore integrating text-sourced information into your existing data-analysis infrastructure.
IBM recently posted a quite nice page on extracting business value from "unstructured" data. The page describes use of IBM's own products and formats to be sure, but it is potentially helpful for anyone who wishes to learn about information extraction from textual sources for data warehousing.

IBM's page starts with a brief text-analytics overview. It then dives into implementation with the OmniFind Analytics Edition for DB2 and its pureXML capabilities. It describes a process flow includes XML tagging of document features and the alternatives of mapping the XML schema to relational database structures or use using the XML structures directly for analyses. This text-analytics workflow, and the choices involved in dealing with text-sourced information, are not specific to IBM's tools, however. So which IBM provides diagrams and code listings and an analysis of the alternative approaches that relate to their own products, the lessons apply much more generally.The premise is that because much valuable business information originates in "unstructured" form — e-mail, Web pages, news and blog articles, corporate reports, etc. — you need to look at text analytics as a technology that can unlock value. And naturally, if you already have a BI program and a data warehouse, you'll want to explore integrating text-sourced information into your existing data-analysis infrastructure. You'll want to explore unified analytics.

Information extraction to databases enables unified analytics. I cover approaches in my own text-analytics courses and presentations — I use open-source GATE (General Architecture for Text Engineering) software for illustrations and examples in order to remain independent of any product — but IBM's is the first clear, freely available, and practical technical exposition that I have seen on this topic. If you want to learn more about unified analytics, do visit IBM's From Text Analytics to Data Warehousing page.

Disclosure: IBM is a sponsor of a editorially independent text-analytics report I am writing, which is unrelated to my Intelligent Enterprise writing.
IBM recently posted a quite nice page on extracting business value from "unstructured" data. The premise is that because much valuable business information originates in "unstructured" form, you need to look at text analytics as a technology that can unlock value. And naturally, if you already have a BI program and a data warehouse, you'll want to explore integrating text-sourced information into your existing data-analysis infrastructure.