Methods of Integrating Full-Text and Fielded Data Search

Jul 01, 2007

Download This article addresses methods of integrating fielded data with full-text indexed search, with the aim of providing more relevant search results. The discussion relies on the dtSearch´┐Ż Text Retrieval Engine for its specific examples, although the general concepts have broader applicability.

Document Metadata: The simplest option for integrating fielded data and full-text searching is to use existing fields in documents. For example, MS Office, OpenOffice, PDF, HTML, and other documents all contain metadata fields. Using the fields inside these documents has the advantage of making each document its own self-contained data unit. The diversity of document types and the size of a document collection can, however, make adding fields to each document prohibitively time consuming. The fielded data itself may also require a more complex table or hierarchical data structure than the underlying documents' fielded data options support.

Database Metadata: Another alternative is to store fielded data for each document within a separate database such as SQL or XML. The documents themselves can either remain outside the database with only a filename or other identifier in the database. Or the documents can be inside a BLOB field in the database. Because a structured database holds the fields, the database approach supports a more complex relational metadata structure.