I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?
I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?I doubt there's earth-shattering news in my discovery, but it's new to me and perhaps to you.
Think about it: the presence of contextual ads and related links, even if of marginal quality, increases the incidence and density of searched-on keywords and themes. Density is the proportion of incidence of a given term or theme to the set of terms and themes that, for the indexer, equals page content. It is reportedly a factor in Google page ranking. Contextual ads and a set of related link can therefore boost term-related search-index weighting and hence the search-results ranking of a page that displays them.
The aggregator-advertisers are certainly aware of this feedback effect. Myself, I'm not concerned enough with search-engine marketing to have researched them. I'll leave that investigation to readers. But I will explain that my foray into search-engine forensics was motivated by a bit of vanity, by a desire to see who might be linking to a blog entry I posted last month. I googled its title, "Roads to Semantics: Tim Berners-Lee and Bill Inmon."
Search on that title for yourself if you wish. The hit list you get should be similar to the one I got. I consider the machine-generated pages that appear in the list to be nothing but noise, drowning out any intentional links there may be from other blogs.
That topix.com site that produces hits number 3 and 4, trailing only the Intelligent Enterprise links, appears to be a content leach. Their About Us page says they "aggregate news from thousands of sources, create thousands of topically driven news web pages and populate each of those pages with only news about that particular topic." That latter bit is false. The page that harvests my blog entry is chock full of advertising, some related to my article, some not. The site is essentially a meta-search engine that adds ads but only grade-D content value. And a I wonder if all those ads don't inadvertently help them achieve those Google top slots.
I also found search results that struck me at first as a bit strange. I thought I'd puzzle them through.
Take one hit that came up for me on the 6th page, a link titled "Figuring Interest Rate [sic]." It's to a site that aggregates news items, pretty clearly for the sole purpose of displaying ads. I detest this particular variety of Web hucksterism. It's a small step above spam.
That page displays Ads by Google. Anyone can host Google ads; these low-quality machine aggregators exploit that. But how in the world did my blog entry, which had nothing to do with interest rates, land us on that page?
The short page description Google displays contains the text
Roads to Semantics: Tim Berners-Lee and Bill Inmon.
Intelligent Enterprise - This, combined with the government's renewed
interest in everything, ...
So the aggregator keyed on the word "interest" -- and I note that Google's indexer did not disambiguate two, different meanings of that word. In fact, the indexer created an association despite a semantic mismatch. The aggregator wouldn't care: non-disambiguation means more hits.
That text about "interest" is not in the current version of the page but it is still in Google's cached version. The sentence continues on, "interest in everything, seems like a perfect medium for the control freaks." I didn't write that! I found, however, that one of my Intelligent Enterprise blogger colleagues, the inimitable Neil Raden, did.
The link recorded by the aggregator is to an IE page that lists a bunch of blog entries. I'd infer that the aggregator pulled in the page when my entry was newest, at the top of the page. It associated my title with text farther down the page, from a different blog entry, that carried the keyword of interest.
I tried a few more of the links listed in response to a Google search on my earlier blog title. Based on the URLs, and looking at the content, I found many more machine-aggregated pages. All had Google ads. No surprise; these pages were designed for Google-search findability, which was in turn boosted by the presence of the ads at indexing time.
Do your own tests if you wish to confirm (or disprove) the feedback effect. I'll leave you with the thought that my experience supports the views I expressed in my Roads to Semantics blog entry.
Semantic capabilities matter, but given how easy it would be to throw a bunch of RDF tags into machine-generated ad-platform pages, Semantic Web mark-up standards will not produce the Web of intent we'd like to see. The aggregators and their successors will trick your agent-bots as easily as they have exploited the search engines. It's analytics that can tell ads from the base content of a page and right the mis-ranking of spam-like aggregation pages. It's analytics, not pie-in-the-sky expectations that we'll all start publishing with RDF and OWL, that will eventually produce a Semantic Web that's worth using.
Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation and chairs the Text Analytics Summit.I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.