Software // Information Management
Commentary
1/7/2014
09:06 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Semantic Web Business: Going Nowhere Slowly

The semantic web vision persists, but the tools and processes don't stand up to today's data chaos.

I've been a semantic web skeptic for years. SemWeb is a narrowly purposed replica of a subset of the World Wide Web. It's useful for information enrichment in certain domains, via a circumscribed set of tools. However, the SemWeb offers a vanishingly small benefit to the vast majority of businesses. The vision persists but is unachievable; the business reality of SemWeb is going pretty much nowhere.

The SemWeb dream centers on sharing linked data via the W3C's Resource Description Framework protocol. There is no question that SemWeb aspires to a worthy goal, but its tools and processes are no match for the reality of never-diminishing online, social, and enterprise data chaos. SemWeb can't keep up with the flow, even on the limited portion of the data universe that is published on the World Wide Web. We will never achieve its ideal universe of neatly marked up data, published by content producers in accordance with the prescriptive W3C standards.

More achievable is an ad-hoc, semanticized web of after-the-fact, situational markup (annotation) by content consumers and data intermediaries, including the leading Internet search engines and data brokers. The reality isn't a linked data web of interconnected resources. More real is a set of linkable data -- marked up or stored in some queryable format, selectively findable and accessible via tools -- and methods that may or may not be standardized. This has been achieved, and it is rapidly advancing, in the hands of companies that range from AlchemyAPI to ClearStory to IBM and hundreds (or perhaps thousands) of other analytics and big data startups and established firms.

[Want more of Seth Grimes's deep insight into computing trends? Read 9 Truths Lead To Big Data's Future.]

I've been promoting this alternative vision for years through my Sentiment Analysis Symposium, which covers natural language processing and information-extraction and analytical technologies. (See also my 2011 whitepaper for OpenText, "12 Things the Semantic Web Should Know about Content Analytics.")

As I told Jenny Zaino for her SemanticWeb.com post Good-Bye 2013:

Adoption of Linked Data and expansion of the Semantic Web [has been] far outpaced by the development of private knowledge graphs and focused search and query systems (often affording external access) from the likes of Facebook, Google, Wolfram Research, and Apple (Siri). A set of solution providers, as varied as NetBase, Digital Reasoning, and DataSift, are bringing similar capabilities, based on data mined from online, social, and enterprise sources, to government and corporate users.

The heart of an IBM Watson instance, whether applied to play Jeopardy or for medical diagnosis or customer intelligence, is a big, fat knowledge base. (Disclosure: IBM's jStart innovation program is a sponsor of the 2014 Sentiment Analysis Symposium, and Digital Reasoning is sponsoring my 2014 Text Analytics Market Study.)

A semantics-infused example of social graph connectivity on Facebook.(Source: Wikipedia).
A semantics-infused example of social graph connectivity on Facebook.
(Source: Wikipedia).

This article was prompted by a note from the consultant David Siegel, who not only shares but also has lived my view regarding the lack of semantic web business interest. He wrote to me that his four years in "Semantic Web stuff" didn't pay off. He has now switched to management consulting. With David's permission, I'll relay his explanation.

"My goal was to be the bridge between business decision-makers and SemTech. There's still a huge gap there," he wrote. "Management seems to be lurching toward [semantic technology in] ways like via social and mobile and Google integration," but not via the semantic web. "I really thought I'd get a ton of consulting out of it, but instead I worked for four years and got two keynote speeches, nothing else. I got a TON of interest, but no paying clients, so I'm moving on."

David has shifted his focus to business agility consulting. "Agile" describes what the semantic web is not. It can't keep up with the fast rate of data production (per big data's velocity characteristic), or with the variety (another big data "V") of types, linkages, and usages (many unforeseen and unaccommodated by the data provider's chosen markup approach) of modern-world data.

The semantic web is more than 12 years old and still puttering along. From a business perspective, it is going nowhere slowly.

Seth Grimes is the leading industry analyst covering text analytics and sentiment analysis. He founded Alta Plana Corporation, a Washington IT strategy consultancy, in 1997. Follow him on Twitter at @SethGrimes.

There's no single migration path to the next generation of enterprise communications and collaboration systems and services, and Enterprise Connect delivers what you need to evaluate all the options. Register today and learn about the full range of platforms, services, and applications that comprise modern communications and collaboration systems. Register with code MPIWK and save $200 on the entire event and Tuesday-Thursday conference passes or for a Free Expo pass. It happens in Orlando, Fla., March 17-19.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Page 1 / 2   >   >>
chrisglasier
50%
50%
chrisglasier,
User Rank: Apprentice
3/28/2014 | 12:58:26 AM
The vital role of ordinary people
A prime concern should be how to give ordinary people an incentive to put their data on the web. The answer must lie in making their work easier e.g. with linkable data on the web rather than dead data in spreadsheets. Talk of RDFs, URIs just puts off people who have this type of data, and is not needed here.

Here is an open email to Phil Archer posted on W3C forum that outlines the rationale: http://www.w3.org/community/forum/2014/03/25/open-email-to-phil-archer-data-activity-lead/

With a significant part of the semantic web designed jointly by practitioners and technologists, practitioners get the chance to benefit from automated tasks; technologists get the chance to apply their skills for tasks they otherwise would not know exist.
rbenjamins
50%
50%
rbenjamins,
User Rank: Apprentice
1/23/2014 | 1:01:51 PM
Re: The point is not that there's nothing to SemWeb...
14 year ago I was a semantic web researcher and evangelist. 7 years ago i started to become more sceptical and wondering where the industrial uptake was. Now i am neutral and looking for evidence for both views. I am co-chairing the industrial track at the next International Semantic Web conference, so curious what comes out (http://iswc2014.semanticweb.org/call-industry-track-abstracts). I still believe in the grand vision of the Semantic Web, but unclear on what technologies or communities will finally make it happen. 

My observations so far are:
- The SemWeb community has not done a too good job in explaining it to business so businesses clearly understand the implications such as benefits versus TCO. Ask any CIO or CTO of a large corporation on SemWeb or SOA and how much money s/he has invested in them (successfully or not, that is another issue).
- SemWeb seems to be out there (see Frank's list) but the fact that only a few people know such extended lists also says something. To me it says that SemWeb is still with early adopters (BBC), not in the mainstream market (when you wouldn't even think of generating a list). I suspect that several on the list are more referring to trials of the technology in industry than to operational systems. 
- There is a "categorisation" issue. All agree that scheme.org is out there, but the sceptical says it's not SW while the advocates say it is. Same for Google Graph, FB Graph, etc. I would say the real question is whether those major uptakes of SemWeb-like technology are thanks to the SemWeb community and dedicated (government) investments, or whether they come from alternative efforts with a pragmatic business perspective.  I think the latter, but actually i don't know.
- Semantic Web is on the Gartner Hype Cycle curve since at least 2006. And the estimation is every year that it takes more than 10 year to reach productivity, also in 2013. 
DanielG198
50%
50%
DanielG198,
User Rank: Apprentice
1/11/2014 | 10:10:12 AM
Narrow views of the Semantic Web
I think that I too would feel that the Semantic Web was a poor solution if I saw it only as an "alternative vision" to sentiment analysis. The problem as I see it is that the Semantic Web is so much more. I have not personally worked with it in connection with Big Data, but regardless of Grime's views Gartner has just placed Big Data as the driver of Semantic Web and identified the pair as transformative technologies within the not so distant future. (Transformative is the highest rated impact of the technologies analyzed.) Gartner seems to have until recently seen Semantic Web only in terms of the vision of a Global Graph, which it saw as unattainable or at least to far into the future to be of interest. Its realization that it has practical here and now utility has jumped it from ascending towards the peak of inflated expectation and skipped it right over to the trough of disillusionment. David Siegel's story is sad, but what does it really tell us? I have had similar difficulties. I nevertheless still feel that there is tremendous benefits to be reaped from its use. What would be helpful would be to see more discussion about how to convey these benefits.
AlanMorrison
100%
0%
AlanMorrison,
User Rank: Apprentice
1/8/2014 | 9:09:57 PM
Re: Always Just Beyond The Horizon
I know he's just being provocative to evoke comments, but Seth is usually more insightful than this. I think he's using the ad-hoc social analytics hammer and seeing everything as a nail. And his research on "semantic web" as he so narrowly defines it seems to have stopped in 2006. This is no better than Shirky's old rant on filter failure.

The semweb philosophy has evolved well beyond linked data, and many of the practitioners are many of the same proprietary vendors and services Seth cites. They're just offering opaque SaaSes or APIs, and the stuff is under the covers. So some of the standards haven't panned out, and the dreamy open source, open data vision hasn't been fulfilled--that doesn't mean the tech isn't being used.

Seth seems to be focused on perishable data because that's where the most growth and chaos is, and that's fine. but the data that's not so perishable merits a lot more care. Why did people go through the pain of XBRL? To get beyond the limits of provincial data and into more global reusability that's truly reliable. Similarly, people who've studied semweb methods enough to make best use of them and have endured some pain are seeing scalability benefits when it comes to auto curation for example. See http://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dynamic_semantic.html. It takes time and expertise to build systems like this one.

Are you guys looking at DAM at all? There are use cases like the Magnum Photo case described in a sidebar here: http://www.pwc.com/en_US/us/technology-forecast/2012/issue3/features/feature-gaming-redesigning-business.jhtml Involves a clever use of crowdsourced image tagging in a gamified environment. I'm not sure how you'd scale decent search and discovery in huge online photo repositories if you didn't use a method like this.

Another person in this thread cites a dozen case studies that go beyond the vague decks you're describing.

The text analytics methods Seth points to are valid, but they don't go far enough. They need to be used in conjunction with other methods when it comes to non-perishable data/content. What's blooming now is a heterogeneous approach to schema--fixed on the RDBMS side, dynamic, multiple, and schema-on-read on the NoSQL side, and optional, shared and collaboratively built a la schema.org. I think the bottom line is that the tools and methods are finally starting to fit the jobs, and there's a widening selection of them.

You have posted helpful pieces on NoSQL data use cases in the past. Just think you should look at what's happening on the content processing side once in a while. They're building on top of what's possible with NLP engines, for example. There's more back and forth between the NoSQL and semweb communities than there used to be, and the result could be standards such as JSON-LD that are more aligned with the way developers work.
danbri
100%
0%
danbri,
User Rank: Apprentice
1/8/2014 | 3:15:56 PM
Re: The point is not that there's nothing to SemWeb...
At Google we're seeing schema.org markup (an RDF vocabulary) on 5+ million domains - from big names to the long tail. Pizza shops, museums, TV shows, opening hours, research papers, job listings, company logos, govt datasets... across broad range of areas, every walk of life. 

In my view it's a mistake to contrast some perceived "classical", stilted and fragile notion of Semantic Web against the strawman rival of newer flexible/pragmatic approaches. All communities have internally some such axis against which their efforts can be plotted, and variations on the neat/scruffy distintion. The most successful RDF work has always had a pragmatic, hacker side to it, and embraced tooling (NLP, machine learning, databases, ...) from a variety of fields.  It's not a competition - use all the tools that make sense to get a job done! 
prototypo
100%
0%
prototypo,
User Rank: Apprentice
1/8/2014 | 11:48:40 AM
Re: The point is not that there's nothing to SemWeb...
Hi Seth,

I'm afraid I must take strong exception to your position.

I have heard people redefine the Semantic Web for years in some narrow way so they can say it has failed.  The most fallacious redefinition is that it requires "perfect" (top-down) data definition.  This is exactly contrary to the definition of the Resource Description Framework.  RDF expects people to lie and to be mistaken. It expects and allows for dirty data.  Any other presumption would simply not match the real world.  It specifically allows and encourages bottom-up creation of distributed data.

Quoting the RDF 1.1 Primer, "RDF is intended for situations in which information on the Web needs to be processed by applications, rather than being only displayed to people."  Its use is currently being wildly deployed and quite successful, as pointed out by Frank and Amit (and others, if you ask Google).

Also, it was the Semantic Web that gave us SPARQL, the only standard query language for distributed data.  SPARQL is being widely adopted by vendors for the simple reason that SQL will never allow cross-implementation queries.

I'd love to see you absorb Frank's and Amit's comments and write a more balanced article.  Would you consider doing so?
anon8070485150
50%
50%
anon8070485150,
User Rank: Apprentice
1/8/2014 | 9:49:10 AM
Re: Always Just Beyond The Horizon
Google Knowledge Graph is an example.  It's a graph knowledge base that is isomorphic to RDF. In fact, Google now provides the download files of the open source version, Freebase, as RDF.
FrankVanHarmele
100%
0%
FrankVanHarmele,
User Rank: Apprentice
1/8/2014 | 9:35:47 AM
Re: The point is not that there's nothing to SemWeb...
Ironic to see schema.org quoted as a signal that Semantic Web technology will never work....  It is semantic web technology. "No systems using the information from Schema.org"? Bing, Yahoo and Google harvesting doesn't count? Hmm... 

And what about Google's Knowledge Graph? It's a harvested/copied-and-then-curated RDF graph, now used to power Google's front-page. See e.g. righthand-side of https://www.google.com/search?q=barack+obama

Reading "one of these slideshows to confirm a suspicion" that "there are no names of actual companies putting them into practice"? "Did anyone really back that vision in a serious financial way?" "Only useful for information enrichment in certain domains"? I've apparently been too slow in updating my "Semantic Web Good News Show" (at http://www.slideshare.net/Frank.van.Harmelen/semantic-web-good-news ), so here goes with a bunch of recent updates. In no particular order:

Who's using the GoodRelations product vocabulary and markup?
  • Google
  • Yahoo!
  • BestBuy
  • sears.com (15 Million items)
  • kmart.com (250,000 items)

... and 10,000 more (http://www.heppnetz.de/projects/goodrelations/ )

Both Oracle's DB and IBM's DB2 implement semantic web datamodels and protocols

Pretty much every webpage on the BBC website (heard of them?) now hits an RDF triple-store. Examples: http://www.bbc.co.uk/programmes/developers,  http://www.bbc.co.uk/nature/feedsanddata, 60.000(!) BBC news items annotated with RDF (http://www.ontoba.com/blog/bbc-news-labs), etc.

NXP is one of the world's biggest makers of microprocessors (4.3b$). On data.nxp.com they have data on 26.000(!) products, internal triplestore (Dydra, 250K entitles, 2.5m triples)  to drive a website, this is externally available, to make it part of a broader ecosystem.

Renault publishes configuration options for its cars in RDF http://www.slideshare.net/fpservant/ldow2013

Electricity de France generates 300.000 personalised energy bills using SemWeb technology: http://data.semanticweb.org/conference/iswc/2013/proceedings-2/paper-04/html

New York Times publishes Linked Open Data http://open.blogs.nytimes.com/2010/06/24/more-subject-headings-published-as-linked-open-data/?_r=0

Ad.ly (adds from celebreties) goes RDF http://www.slideshare.net/testac/how-hollywood-learned-to-love-the-semantic-web

Monster Board goes RDF:  http://semanticweb.com/monster-offers-more-semantic-enabled-help-to-job-seekers-and-hr-staffers_b19673

Bill and Melinda Gates foundation goes RDF: http://priyankmohan.blogspot.com/2010/02/bill-and-melinda-gates-foundation_26.html

Let me know if you want more, I can supply dozens more of these. But of course no number may ever be enough if "you've been a semantic web skeptic for years"....
SethGrimes
IW Pick
100%
0%
SethGrimes,
User Rank: Apprentice
1/7/2014 | 4:40:45 PM
The point is not that there's nothing to SemWeb...
My point is not that there's nothing to SemWeb. I'll repeat a couple of sentences from the opening paragraph: "SemWeb is a narrowly purposed replica of a subset of the World Wide Web. It's useful for information enrichment in certain domains, via a circumscribed set of tools." My point is that there's much, much more available, providing similar capabilities, via more dynamic technologies and methods.
Laurianne
50%
50%
Laurianne,
User Rank: Author
1/7/2014 | 2:58:15 PM
Re: Always Just Beyond The Horizon
"We will never achieve its ideal universe of neatly marked up data..." Did anyone really back that vision in a serious financial way?
Page 1 / 2   >   >>
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - August 27, 2014
Who wins in cloud price wars? Short answer: not IT. Enterprises don't want bare-bones IaaS. Providers must focus on support, not undercutting rivals.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.