Enterprise search systems have been available for many years and performed valuable functions in text and content searches. More recently, enterprise users have had the choice of powerful open source systems, many based on Apache Lucene, that can do broader tasks.
Goldman Sachs has adopted one of them, Elastic's Elasticsearch, and put it to use in innovative ways. Elasticsearch reaches into text sources, but Goldman software engineers are building applications that make use of its data retrieval powers as well as its large capacity for unstructured data.
"Elastic has been one of the most interesting open source products that we've seen in the last couple years," said Don Duet, global co-head of the Goldman Sachs technology division, in an interview with InformationWeek. "What's impressive about it is how much value it can create in organizations."
Elasticsearch and its co-products -- Logstash, Elastic's server log data retrieval system, and Kibana, a dashboard reporting system -- are written in Java and behave as core Java systems. This gives them an edge with enterprise developers who quickly recognize how to integrate them into applications. Logstash has plug-ins that draw data from the log files of 165 different information systems. It works natively with Elasticsearch and Kibana to feed them data for downstream analytics, said Elastic's Jeff Yoshimura, global marketing leader.
[ Learn how Goldman Sachs has played a leading role in Open Compute. Read Open Compute: More Financial Services Firms Jump In. ]
The Goldman Sachs technology division has put Elasticsearch to several innovative uses with minimal staff time invested. Examples include applications to help the legal department with contract searches, to enable executives and clients to track trades, and to assist engineering teams in locating and eliminating software bugs.
In the past, when Goldman wanted to check all its legal contracts for a particular clause or wording, the task could have required hiring platoons of lawyers to manually go over thousands of paper documents. Instead, a software engineer in Duet's organization built a system that first digitized each contract, using Apache Tika content analysis and optical character recognition software.
Tika was able to recognize more than 1,000 file formats and extracted metadata useful for generating search engine indexes. Elasticsearch then was fed all the contract documents. If the required terminology wasn't found in an Elasticsearch review of the contract, it was flagged for revision by company lawyers.
Duet said a single technology division engineer could create such a system because Elasticsearch has a RESTful API interface and functions much like a typical Java application. Some enterprise search offerings wouldn't necessarily fit in with the same ease that Elasticsearch has because they use their own programming paradigms and conventions that must be learned by IT.
Duet's organization built another Elasticsearch application for tracking trades throughout their lifecycle. "There are many different applications and server logs involved in the process of executing trades," he noted. Goldman's trade tracker application functioned something like a UPS package tracking system and was able to report to executives or to clients on the status of a given trade.
The trade tracker system could get data from different systems, consolidate it in Elasticsearch's key value store system, and then search on it for meaningful data. That meant different technical teams didn't need to be convened to extract data and figure out how to integrate the information from different systems.
Goldman has incorporated Elasticsearch into how its software developers work, with more than 700 of them having access to a search-based code management system. When a bug is found in one version of a piece of software, Elasticsearch can comb through the code library and find all instances of the bug.
Here, Elasticsearch works with Kibana, which builds dashboard reports on the status of projects and code that developers are working with. It captures source code changes, compares the "before" and "after" version of code, and can search for a snippet of code wherever it occurs. Code comments, reference designs, and documentation can all be pulled together through the power of the search engine.
Duet said technology division managers were able to spot Elastic when it first started appearing in the company's software asset management system. The new open source code became a frequent topic in emails and in chats on the programmers' social networks. Usage jumped from a few copies to 50 copies to 200 copies, and the technology division decided to make it widely available as authorized software throughout the company. It also contributed to the Elasticsearch project, engaged with Elastic engineers, and obtained a technical support contract for the search engine.
In addition to the technology division's software engineers and developers, Elasticsearch is sometimes used as part of a system by the financial engineering group doing deep financial modeling, Duet said. Operations staff can also use the Kibana dashboards to quickly build reports that used to require painstaking manual builds.
Goldman Sachs has 9,000 employees actively using technology and "several thousand" of them are now using Elasticsearch, Duet said, either as developers of new applications or as users of existing ones. That's a broader role for search than before, when enterprise search was limited to conventional keyword searches on text and content.
With Elasticsearch, Goldman is showing how versatile and useful search can be as a general purpose service inside the company and as a service built into many different types of applications.