Software // Information Management
Commentary
12/8/2008
11:13 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

BI on Content Feeds, a.k.a. Continuous (Twitter) Transformation

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. I wrote in September on a leading-edge implementation at Thomson Reuters. But twitter messaging is both faster and, given social-network mediation, more focused: instant messaging text gone public. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.CEP is complex-event processing, in-memory analysis of data and event streams via continuous queries. CEP is an uncomfortable category for some vendors, who would prefer to focus on applications or distinctive capabilities. SQLstream's forte is continuous data integration and transformation; heavy-duty analytics would be done against a data warehouse. SQLstream's open-source underpinnings, close links with BI vendor Pentaho, and integration with the open-source Mondrian relational OLAP tool are other distinguishing elements. (I'll write more on these points below, after the screenshots.) The Mondrian integration in particular allows SQLstream to support real-time OLAP by feeding stream-computed aggregates to a backend data warehouse with as-needed invalidation of the Mondrian cache. Lastly, the company has worked to stay true to standards such as SQL-2003, SQL/MED for access to external databases, and XMI, and to use freely available tools where possible such as the Eclipse platform.

Continuous transformation is about replacing batch ETL processing with on-the-fly data acquisition, aggregation/processing, and action/DW-loading. Continuous transformation reduces latency, the lag between data arrival and availability for analysis. Business logic is programmed with SQL — SQLstream uses views to construct a processing pipeline and INSERT INTO, extended for streams, to export data — with C or Java coded adapters for data input and output and user-defined functions (UDFs) and transformations (UDXes). And that's where twitter content acquisition comes into play.

SQLstream certainly isn't unique in the ability to harvest RSS and Atom feeds, but so far it's the only tool I've seen that consume twitter messages, via an API. (I wrote a bit on twitter-BI recently.) Here's a screenshot that shows it in action, followed by one that shows the SQL table-like definition of a twitter feed and the SQL to work from it. Click on each image for a larger version.

Three content feeds in a SQLstream studio interface (built on Eclipse):SQLstream screenshot
A stream defined like a table, using SQL/MED for twitter-data definition:SQLstream screenshot
A continuous-transformation query, filtering on a regular-expression/keyword match:SQLstream screenshot

SQLstream is only one example of a CEP(ish) tool that consumes content feeds. I asked folks on the CEP-Interest list to tell me about other examples. Siva Kumar Tangudu let me know that "Gnip makes it easy to consume content feeds. It supports twitter, digg, delicious, etc."

Marco Seiriö replied, "A year ago or so I did a demo on RuleML 2007 where we showed ruleCore processing a feed from flickr. We had this rule which triggered when there were a large number of photos posted in any area. Then we lit up a marker on a Google map to show that there is currently high posting activity in that area. The idea was to show off our location aware event processing and show how we could use location from a feed of geotagged images to trigger rules."

And Alexandre Vasseur wrote me about use of the Esper open-source CEP engine by DataComplex "to power their SaaS based offering on the Amazon EC2 cloud; I think now have a twitter feed support," which I have not been able to verify.

But back to SQLstream and its Mondrian and Pentaho connections. One of the company leads is Julian Hyde, who created Mondrian. I first "met" him four years ago when I was researching an article on open-source BI. Julian subsequently signed-on with Pentaho, but as a part-timer. He also helped create Eigenbase, "an extensible open-source platform for building specialized data management systems in a wide variety of application spaces." SQLstream is built on Eigenbase, as is LucidDB, the open-source DBMS that back-ends the LucidEra SaaS BI platform. There's a lot to like and admire in this work.

SQLstream and other, similar products are turning content feeds into BI. Those feeds are now one more type of source that enterprises can and should consider in the quest for competitive advantage.The rapid pace and high volume of twitter messaging has upped the stakes for BI on content feeds. BI on content feeds: that would be stuff like monitoring and mining sentiment from social media for reputation and brand management, which you can do with text analytics on RSS and Atom feeds and Web pages. One approach to making sense of the flow is the CEPish application of continuous transformations that the folks behind SQLstream recently showed me.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
The weekly wrap-up of the top stories from InformationWeek.com this week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.