Can Data Algebra Make Big Data Faster And Cheaper?

Data algebra is a new approach for managing, integrating, and searching data faster and more efficiently. Here's why developers and IT departments may want to consider adding it to their toolsets.

unit of data. A relation is a set of couplets capable of representing a simple one-record file or a simple record in a database. A clan is a set of relations that can represent a multi-record file or a table in a relational database, a list, or an XML file with repeating structures. A horde is a set of clans that can represent a database, a computer file system, data sets in databases, or some XML files.

"If you can describe every different kind of data coming from every imaginable data source as some specific organization of couplets and sets -- data algebra objects -- now you're understanding it's more uniform and systematic," said Rogers. "We've been blinded by the apparent differences of things -- like SharePoint is different than the Dow Jones newsfeeds -- and not looking at how much it's the same thing. That lets me build software that's a lot more uniform, systematic, and simpler."

When the company started out, Algebraix applied data algebra to relational and graph databases. The company later built an analytics platform. In the last few months Algebraix released a set of Python libraries and published the book The Algebra of Data to catalyze broader acceptance of data algebra.

It's not light reading. The book describes what data algebra is, how it can be used, and the details of its set theory (which is based on Zermelo-Fraenkel or ZFC set theory). The Python libraries are open source and available on GitHub and PyPL. These libraries allow developers to represent data algebraically using the structures presented in the book.

"The idea you can take a given state and define it in mathematical terms and then manipulate it mathematically is very powerful. Throughout history, people have done that with extraordinary effects," said Bloor. "The whole universe of data is out there and it's not mathematically defined. There are likely to be some great victories when you define it mathematically."

Yes, but will it be accepted?

Data algebra has a lot of potential, but the realization of that potential depends on Algebraix and third parties. At the present time, Algebraix is focusing on data management and data integration, but data algebra has more potential than any one company can fully exploit.

Python programmers have access to the libraries, but the libraries are not designed for programming. Their purpose is to help developers understand how to represent data algebraically. What IP licensees will get and what they will do with the IP remains to be seen.

Those who want to understand more about data algebra should read The Algebra of Data. The book describes the math and how it can be applied. More books and resources are planned for the future, which is a good thing, because the application of the math needs to be described in greater detail.

[Cloudera Boosts Hadoop Portfolio With Security, Data Update Offerings]

Spencer Greenberg, a mathematician and founder of decision-making tool provider Clearer Thinking agrees. "Data algebra seems potentially very useful. One of the questions I have looking over the materials is what sort of applications do they think this is going to solve versus what exists. What will be the practical advantages when implemented? It may be too early to tell. I can't tell, but sometimes you get huge benefits from systematizing things, from creating the mathematical formulas."

Another obstacle is that some individuals will assume there is nothing new about data algebra and dismiss it out of hand. However, five years of research and development, $40 million of funding, nine patents, the book, and the Python libraries suggest otherwise. Whether it's a boom or a bust is an open question.