BI on Content Feeds, a.k.a. Continuous (Twitter) Transformation

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

Seth Grimes, Contributor

December 8, 2008

4 Min Read

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. I wrote in September on a leading-edge implementation at Thomson Reuters. But twitter messaging is both faster and, given social-network mediation, more focused: instant messaging text gone public. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.CEP is complex-event processing, in-memory analysis of data and event streams via continuous queries. CEP is an uncomfortable category for some vendors, who would prefer to focus on applications or distinctive capabilities. SQLstream's forte is continuous data integration and transformation; heavy-duty analytics would be done against a data warehouse. SQLstream's open-source underpinnings, close links with BI vendor Pentaho, and integration with the open-source Mondrian relational OLAP tool are other distinguishing elements. (I'll write more on these points below, after the screenshots.) The Mondrian integration in particular allows SQLstream to support real-time OLAP by feeding stream-computed aggregates to a backend data warehouse with as-needed invalidation of the Mondrian cache. Lastly, the company has worked to stay true to standards such as SQL-2003, SQL/MED for access to external databases, and XMI, and to use freely available tools where possible such as the Eclipse platform.

Continuous transformation is about replacing batch ETL processing with on-the-fly data acquisition, aggregation/processing, and action/DW-loading. Continuous transformation reduces latency, the lag between data arrival and availability for analysis. Business logic is programmed with SQL — SQLstream uses views to construct a processing pipeline and INSERT INTO, extended for streams, to export data — with C or Java coded adapters for data input and output and user-defined functions (UDFs) and transformations (UDXes). And that's where twitter content acquisition comes into play.

SQLstream certainly isn't unique in the ability to harvest RSS and Atom feeds, but so far it's the only tool I've seen that consume twitter messages, via an API. (I wrote a bit on twitter-BI recently.) Here's a screenshot that shows it in action, followed by one that shows the SQL table-like definition of a twitter feed and the SQL to work from it. Click on each image for a larger version.

Three content feeds in a SQLstream studio interface (built on Eclipse):

A stream defined like a table, using SQL/MED for twitter-data definition:

A continuous-transformation query, filtering on a regular-expression/keyword match:

SQLstream is only one example of a CEP(ish) tool that consumes content feeds. I asked folks on the CEP-Interest list to tell me about other examples. Siva Kumar Tangudu let me know that "Gnip makes it easy to consume content feeds. It supports twitter, digg, delicious, etc."

Marco Seiriö replied, "A year ago or so I did a demo on RuleML 2007 where we showed ruleCore processing a feed from flickr. We had this rule which triggered when there were a large number of photos posted in any area. Then we lit up a marker on a Google map to show that there is currently high posting activity in that area. The idea was to show off our location aware event processing and show how we could use location from a feed of geotagged images to trigger rules."

And Alexandre Vasseur wrote me about use of the Esper open-source CEP engine by DataComplex "to power their SaaS based offering on the Amazon EC2 cloud; I think now have a twitter feed support," which I have not been able to verify.

But back to SQLstream and its Mondrian and Pentaho connections. One of the company leads is Julian Hyde, who created Mondrian. I first "met" him four years ago when I was researching an article on open-source BI. Julian subsequently signed-on with Pentaho, but as a part-timer. He also helped create Eigenbase, "an extensible open-source platform for building specialized data management systems in a wide variety of application spaces." SQLstream is built on Eigenbase, as is LucidDB, the open-source DBMS that back-ends the LucidEra SaaS BI platform. There's a lot to like and admire in this work.

SQLstream and other, similar products are turning content feeds into BI. Those feeds are now one more type of source that enterprises can and should consider in the quest for competitive advantage.The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

Read more about:

20082008

About the Author(s)

Seth Grimes

Contributor

Seth Grimes is an analytics strategy consultant with Alta Plana and organizes the Sentiment Analysis Symposium. Follow him on Twitter at @sethgrimes

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights