Can Data Algebra Make Big Data Faster And Cheaper?
Data algebra is a new approach for managing, integrating, and searching data faster and more efficiently. Here's why developers and IT departments may want to consider adding it to their toolsets.
10 IT Infrastructure Skills You Should Master
(Click image for larger view and slideshow.)
Today's organizations want to manage, process, analyze, and search all kinds of data more efficiently and cost-effectively. To accomplish those goals, they need to reduce unnecessary overhead and find ways to optimize data-related tasks. Data algebra is an option that can help.
Data analytics platform provider Algebraix Data says that data algebra applies mathematical set theory to data analytics tasks. The result is an approach you can use to perform a range of data tasks, whether you are optimizing the performance of Hadoop systems or making database queries.
Two main benefits of data algebra are reuse and optimization, both of which can save time and resources. Here's how it works: To speed database queries, the Algebraix platform resolves a request for data, and then stores the request along with the algebraic expressions, the algebraic transformation, the intermediate results it used to arrive at a result, and the result in an algebraic catalog. That way all of these can be reused.
"Databases calculate various query results, deliver [the results] to a user and then throw it away. Some of that stuff can be reused," Robin Bloor said in an interview. But "it can only be reused if you define it in an algebraic manner." Bloor is chief analyst and cofounder of The Bloor Group and a co-author of the book The Algebra of Data. The book was recently published by Algebraix Data and is available for a free download from the company's website.
Over time, the reuse capabilities can dramatically accelerate query results.
"You can do far more sophisticated optimization when you're using algebraic techniques than you can when you're just using high-level procedural techniques," said Bill Rogers, a senior engineer at IBM and former VP of engineering at Algebraix.
The point is not to re-compute what has already been computed. That wastes time and resources. For example, if a person ran a query on terabytes of data and later added 100 new rows of data, it would not be necessary to execute the entire query again to get a correct final result. It would only be necessary to run the second query on the new 100 rows of data because all of the information about the original query has been stored.
The results of the two queries would then be combined to yield a final result. Instead of taking, say, five hours to run the original query and another five hours to run the second query, the final result could be achieved in about half the time. The original query would still take five hours, but the query on the 100 rows could be executed in microseconds.
Practical Uses of Data Algebra
Here's why this approach can be so powerful. All data can be described in algebraic terms. Data algebra can unify data management across different data structures. It can also improve computing performance and capacity. What else can it do? Some of the possibilities described in the book include spreadsheets that can pull in atypical types of data, better performing Hadoop systems, faster data analytics-related processes, and more efficient search capabilities.
"We've been talking about gaming. All software applications -- data management applications, the Internet of Things, defense, security, every aspect of IT -- we could potentially play a role in, but that's too broad, which is why we have an IP strategy. We want to keep the math open source," Algebraix CEO Charlie Silver said in an interview.
The company plans to license its IP. Algebraix holds nine patents. The Algebraix platform is both a proof-of-concept and a commercial product. Algebraix is also planning to build a universal optimizer for Hadoop.
"Applying [data algebra] has changed the way I look at software development and design. Now I think about what's going on mathematically, I understand that, and I understand how I'm going to do that physically. It's made me look at what I'm doing in a more rigorous and precise way," said Rogers.
Working with data algebra has also shown Rogers that things that appear to be different are more similar than they seem. Although the details of the algebra described in the book are more complicated than what's presented here, fundamentally data algebra describes data using hierarchical sets in which the smaller set is included in the larger set: Specifically, a couplet represents a fundamental
Page 2: Will data algebra be accepted?
Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2018 State of the CloudCloud adoption is growing, but how are organizations taking advantage of it? Interop ITX and InformationWeek surveyed technology decision-makers to find out, read this report to discover what they had to say!
Infographic: The State of DevOps in 2017Is DevOps helping organizations reduce costs and time-to-market for software releases? What's getting in the way of DevOps adoption? Find out in this InformationWeek and Interop ITX infographic on the state of DevOps in 2017.