Re: Always Just Beyond The Horizon
I know he's just being provocative to evoke comments, but Seth is usually more insightful than this. I think he's using the ad-hoc social analytics hammer and seeing everything as a nail. And his research on "semantic web" as he so narrowly defines it seems to have stopped in 2006. This is no better than Shirky's old rant on filter failure.
The semweb philosophy has evolved well beyond linked data, and many of the practitioners are many of the same proprietary vendors and services Seth cites. They're just offering opaque SaaSes or APIs, and the stuff is under the covers. So some of the standards haven't panned out, and the dreamy open source, open data vision hasn't been fulfilled--that doesn't mean the tech isn't being used.
Seth seems to be focused on perishable data because that's where the most growth and chaos is, and that's fine. but the data that's not so perishable merits a lot more care. Why did people go through the pain of XBRL? To get beyond the limits of provincial data and into more global reusability that's truly reliable. Similarly, people who've studied semweb methods enough to make best use of them and have endured some pain are seeing scalability benefits when it comes to auto curation for example. See http://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dynamic_semantic.html. It takes time and expertise to build systems like this one.
Are you guys looking at DAM at all? There are use cases like the Magnum Photo case described in a sidebar here: http://www.pwc.com/en_US/us/technology-forecast/2012/issue3/features/feature-gaming-redesigning-business.jhtml Involves a clever use of crowdsourced image tagging in a gamified environment. I'm not sure how you'd scale decent search and discovery in huge online photo repositories if you didn't use a method like this.
Another person in this thread cites a dozen case studies that go beyond the vague decks you're describing.
The text analytics methods Seth points to are valid, but they don't go far enough. They need to be used in conjunction with other methods when it comes to non-perishable data/content. What's blooming now is a heterogeneous approach to schema--fixed on the RDBMS side, dynamic, multiple, and schema-on-read on the NoSQL side, and optional, shared and collaboratively built a la schema.org. I think the bottom line is that the tools and methods are finally starting to fit the jobs, and there's a widening selection of them.
You have posted helpful pieces on NoSQL data use cases in the past. Just think you should look at what's happening on the content processing side once in a while. They're building on top of what's possible with NLP engines, for example. There's more back and forth between the NoSQL and semweb communities than there used to be, and the result could be standards such as JSON-LD that are more aligned with the way developers work.