PostgreSQL Adds Proximity Search

The open source project has produced a general purpose proximity search for location-based queries and it should generate new applications.

Charles Babcock, Editor at Large, Cloud

May 5, 2011

6 Min Read

Wouldn't you like to ask your laptop or tablet for the 10 coffee shops closest to you? The device would be able to tell you, based on your current location, if it could consult a pre-sorted index of spatial data.

The 9.1 beta release of the PostgreSQL relational database is now available with a feature called the "nearest neighbor." It's likely to spur a new round of location-based end user applications.

PostgreSQL open source developers say both their project and Microsoft's SQL Server development team are in a race to see which can bring the capability to market first.

"We are in kind of a race to release it before they do. Whoever gets there first, it will come down to a matter of weeks," predicted Bruce Momjian, core team coordinator of the project. Microsoft SQL Server spokesmen couldn't be reached immediately for comment.

The nearest neighbor feature or, more scientifically, the K-Nearest Neighbor algorithm, is the ability of the system to use spatial data associated with sets of related objects, such as pizza parlors, shoe repair shops, or coffee shops, to calculate distances between them (in a given city) and save that information in an index.

There are already mobile applications that do this for particular data sets, such as mobile applications that locate shops in the Starbucks chain or locations of Pizza Huts. But the hunt for a variable set of target destinations, also known as a "proximity search," could mean new, quick response mobile applications will soon be available. The application wouldn't have to be specialized to a particular chain. Rather, it could produce results across brand names in response to spur of the moment queries.

That would make PostgreSQL a much more attractive system to developers of mobile applications, Momjian said in an interview. In addition to responsibilities for PostgreSQL development, he doubles as a database architect for EntepriseDB, a firm that produces a commercial product, Postgres Plus, based on the open source code.

Momjian claimed the nearest neighbor feature illustrates that the PostgreSQL project's pace of development is now "unmatched by any other database software, either open source or proprietary." The PostgreSQL team didn't make that type of statement when MySQL was also an independent, open source project. MySQL was acquired by Sun Microsystems in 2008 and is now part of Oracle.

Final implementation of the proximity search feature will come in two to four months, along with the general availability release of PostgreSQL 9.1. A version of the capability eventually will become available in a cloud developer platform, possibly later this year. The EnterpriseDB commercial version, Postgres Plus, tends to quickly match the features found in new PostgreSQL releases. EnterpriseDB is one of the database systems being offered by Red Hat in its new OpenShift platform-as-a-service for cloud developers. OpenShift was announced Wednesday.

Momjian said PostgreSQL for several years has played catch up to commercial systems, achieving standard SQL compliance, replication, and backup and recovery services. The nearest neighbor feature is an example of "how we're pushing into area that nobody has." That still depends on who wins the race to offer the nearest neighbor feature first, Microsoft or PostgreSQL.

Several other additions give PostgreSQL 9.1 a more equal footing with commercial systems. Synchronous replication lets the system safely write two data sets simultaneously, a capability that would be the equivalent of Oracle Data Guard, said Josh Berkus, a core team coordinator. The feature gives a system both high availability and the ability to execute a disaster recovery. The PostgreSQL feature goes beyond commercial synchronous replication, Berkus said in an interview. It allows the function on the basis of individual transactions. If a transaction is covered by synchronous replication, the write to the second database must be committed before the primary database write is completed. That way, if anything interfered with the transaction completing on the secondary database, the entire system would still have data integrity as both copies of the transaction were rolled back and run again.

In addition, PostrgeSQL 9.1 will support Mandatory Access Control in the highly secure SE version of Linux. Security enhanced or SE-Postgres will be used with SE-Linux by the National Security Agency. "The NSA told us, 'If you can do this, we can use PostgreSQL,'" said Berkus. The database now enforces the special identity credentials required by the operating system, and is the first system to do so.

Support for Mandatory Access Control was developed by KaiGai Kohei, a contributor in Japan, said Momjian.

Another contributor, Kevin Grittner, a database administrator for the Wisconsin state court system, produced a unique feature, serializeable snapshot isolation. The feature was needed by the court system because in Wisconsin, the judge of the court in each county needs by law to be in control of the court's local information.

That meant DBAs in two counties might be trying to update the same court case at the same time, but a master DBA in the state capital couldn't lock the system, to allow the changes to occur one at a time without bringing all activity to a halt. Without one of the county actions being locked out, however, the result would be erroneous data, with one or the other updates prevailing. Serializeable snapshot isolation allows local updates to continue in sequence, without locks.

Grittner as a newcomer last year sent email to the core PostgreSQL committers trying to get the function, saying the state court system had a problem. His claim that was refuted repeatedly by PostgreSQL developers, who thought the existing system was capable of managing it. "We went around for about a week, where everybody shot him down," recalled Momjian. "I found myself thinking, this is a smart guy. If he says there's a problem, there may be a problem."

The result was an innovation contributed by Grittner, after he found a paper on how to deal with the issue by an Australian author. Serializeable snapshot isolation "makes sure the transaction will do the right thing … in any mix of transactions," wrote Grittner in an email message. Also with SSI, the transaction can be rolled back and executed again if something has gone wrong with the commits.

"Before, the DBAs were faced with the unpalatable choice between risking data accuracy and explicit locks that would bog down the system," he wrote.

Said Momjian: "People keep coming out of nowhere with particular needs for PostgreSQL," and that will continue to push its development.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights