3 min read

CambridgeDocs Extracts Data from PDF Financial Reports

New xDoc Structured Finance Toolkit enables financial services companies to automatically extract financial data from tables within structured finance reports.
CambridgeDocs, LLC, a leader in the field of unstructured data management and enterprise publishing, last month released the xDoc Structured Finance Toolkit, a new application that automates data extraction from PDF-formatted financial tables. With this new capability, financial services companies such as banks, brokerage firms, insurance companies and other financial institutions are now able to eliminate the otherwise tedious and error-prone process of cutting-and-pasting critically sensitive financial data from tables within structured finance reports like CDO and ABS remittance reports.

Award winning CambridgeDocs has long been recognized as a leader in creating solutions for re-using information from PDF files. With the release of the xDoc Structured Finance Toolkit, CambridgeDocs leverages their technology for extracting tabular data to deliver an easily customized data normalization solution optimized for PDF financial documents.

Rather then trying to identify random table information within a PDF file, the xDoc Structured Finance Toolkit identifies definable sets of tables and extracts detailed information from rows and columns within them more accurately than re-keying or cutting and pasting.

For instance it can easily extract CUSIP, ISIN, Ratings, Rates, Balances, Maturity Dates and Spreads for each Security and, according to the company, even works with notoriously difficult table types that spread across multiple PDF pages. Templates can be reused month to month, and can account for variations in report formats, additional data being added, and shifting of report layout.

The xDoc Structured Finance Toolkit is menu-driven and highly configurable, both in terms of the data it looks for within a PDF file and the ways in which it can deliver that data. The extracted data is managed as XML and can be exported to any other Schema or DTD, Database or analysis tool, CSV, MS Excel or even custom HTML or MS Word, making it simple, for example, to create condensed summary reports of a deal's performance.

"Our approach to recognizing table data is unique," says Kedron Wolcott, VP of Engineering at CambridgeDocs. "Rather then applying a 'guessing algorithm' to a PDF, which is problematic with this type of sensitive data, the xDoc Structured Finance Toolkit is configured to look for specific types of tables. Tables can span multiple pages, occur on different pages each month, come in different sequences or occur on the same page with other tables. And on occasion when there are significant changes to a report structure, our visual interface makes it easy for a business user to update the configuration files."

The xDoc Structured Finance Toolkit is available as an add-on module to CambridgeDocs xDoc Converter and can be downloaded from: