Software // Information Management
Commentary
2/18/2010
10:54 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

The GATE Way to Open Source Text Analytics

Hamish Cunningham is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. Hamish's replies to my questions regarding Text Analytics Opportunities and Challenges for 2010 hold many insights about text analytics and open source...

As part of a recent solution-provider survey, I posed the question, "What do you see as the 3 (or fewer) most important text-analytics technology, solution, or market challenges [or opportunities] in 2010?" to Hamish Cunningham, Research Professor of Internet Computing at the University of Sheffield (UK). Hamish is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. I've used it myself!

Hamish is effectively GATE's CEO. While GATE work is funded in part by a number of sponsors and partners, Hamish is not beholden to VCs or shareholders and is decidedly uncorporate, that is, he'll tell you what he really thinks and what he's up to, openly. Check out his blog, Computing Text. One thing he's doing is shifting the GATE team to focus on users and support, by nurturing the GATE community and via a number of carefully conceived industry alliances. (Disclosure: I am a paid consultant to a GATE partner, Matrixware, a Vienna based services firm that is working with the Univ. of Sheffield and Bulgarian semantic-technologies developer Ontotext to build GATE into a commercially friendly product suite. My enthusiasm for GATE led to the consulting assignment, not the reverse.)

Hamish's reply to my Text Analytics Opportunities and Challenges for 2010 question didn't comfortably fit the model I had in mind for my article, but it's full of insight all the same. Here's what Hamish had to say on GATE and text-analytics futures:

A decade ago, at the start of the naughties, the majority of text-analysis systems came from research labs. At the start of the tennies, we can look back on an explosive growth of startups in the area, followed by acquisitions and consolidation, and latterly the arrival of a healthy market supporting a variety of commercial offerings. The drivers of this expansion included:
  • The Web. (Yawn! I routinely skip the first paragraph of the papers in my field these days, as they all start by making this point.) More recently and more interestingly, social networking.
  • Cost cutting. Replacing costly market research departments with cheap(er) text-mining departments has become the basis of a whole family of text analysis products that mine the "voice of the customer."

So, what prospects for the tennies? More growth, partly because the drivers that arose in the naughties are still with us, but also because of two new factors:

  • Maturity of open source text mining. Replacing costly proprietary software licenses with open source is a trend which we've seen in many other areas. A big sticking point for text analytics hitherto has been the lack of a Red Hat or a Canonical to provide enterprise-level support and training, but that's changing now as more companies sign up to support open source, better training programs become available and so on.
  • Fear and trembling at the data suppliers. The pressure from Google in this area is relentless ("We'll give away the data that you sell to drive use of our tools"). The data suppliers can see that they have to offer a higher level of service in order to hold onto their customer base, and text mining is pretty much the only game in town. So, for example, Thompson buys ClearForest, sinks large amounts of resource into "Open" Calais, etc. etc.

In GATE's case we're also now seeing faster growth in demand that we attribute to our repositioning in 2009. We have new commercial partners who have funded a raft of new features, we have new products to complement the long-standing developer-oriented offering, and a new training and certification program. The tennies look like being a busy decade.

I asked Hamish if he could quantify the increase in interest that he mentioned. His response:

It seems to be something like 3 new commercial-walkins per week instead of the previous rate of 1 a month (though today [January 26] is only Tuesday and we've had several already this week...) A week ago I did this summary (confidential) of the week's new contacts:
  • (top-3 insurance corporation), looking for GATE support with SLA
  • (major US IT contractor), an existing ClearForest customer, looking for GATE training
  • (SME), a startup doing sentiment stuff for marketing, looking for GATE Teamware
  • (big IT corporation), leading CAD supplier, looking for terminology extraction for translation

The week after was similarly productive so the new message seems to be working.

Hamish added final thoughts yesterday:

The interest rate shows no signs of slowing BTW; and various other positive indicators have come my way. It really seems like text analysis and semantics are taking off! Strange. I feel like asking people what's wrong with them... but then I've been waiting for this for 15 years.

Good things come to those who wait (and work to make them happen)!Hamish Cunningham is benevolent dictator of the GATE team, "researching human language computation." Their work is realized in a highly capable, open-source, text-analysis platform, the General Architecture for Text Engineering. Hamish's replies to my questions regarding Text Analytics Opportunities and Challenges for 2010 hold many insights about text analytics and open source...

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 20, 2014
CIOs need people who know the ins and outs of cloud software stacks and security, and, most of all, can break through cultural resistance.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.