Software // Information Management
Commentary
8/24/2007
11:26 PM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Host Google Ads, Boost Your Page Rank

I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?

I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?I doubt there's earth-shattering news in my discovery, but it's new to me and perhaps to you.

Think about it: the presence of contextual ads and related links, even if of marginal quality, increases the incidence and density of searched-on keywords and themes. Density is the proportion of incidence of a given term or theme to the set of terms and themes that, for the indexer, equals page content. It is reportedly a factor in Google page ranking. Contextual ads and a set of related link can therefore boost term-related search-index weighting and hence the search-results ranking of a page that displays them.

The aggregator-advertisers are certainly aware of this feedback effect. Myself, I'm not concerned enough with search-engine marketing to have researched them. I'll leave that investigation to readers. But I will explain that my foray into search-engine forensics was motivated by a bit of vanity, by a desire to see who might be linking to a blog entry I posted last month. I googled its title, "Roads to Semantics: Tim Berners-Lee and Bill Inmon."

Search on that title for yourself if you wish. The hit list you get should be similar to the one I got. I consider the machine-generated pages that appear in the list to be nothing but noise, drowning out any intentional links there may be from other blogs.

That topix.com site that produces hits number 3 and 4, trailing only the Intelligent Enterprise links, appears to be a content leach. Their About Us page says they "aggregate news from thousands of sources, create thousands of topically driven news web pages and populate each of those pages with only news about that particular topic." That latter bit is false. The page that harvests my blog entry is chock full of advertising, some related to my article, some not. The site is essentially a meta-search engine that adds ads but only grade-D content value. And a I wonder if all those ads don't inadvertently help them achieve those Google top slots.

I also found search results that struck me at first as a bit strange. I thought I'd puzzle them through.

Take one hit that came up for me on the 6th page, a link titled "Figuring Interest Rate [sic]." It's to a site that aggregates news items, pretty clearly for the sole purpose of displaying ads. I detest this particular variety of Web hucksterism. It's a small step above spam.

That page displays Ads by Google. Anyone can host Google ads; these low-quality machine aggregators exploit that. But how in the world did my blog entry, which had nothing to do with interest rates, land us on that page?

The short page description Google displays contains the text

Roads to Semantics: Tim Berners-Lee and Bill Inmon. Intelligent Enterprise - This, combined with the government's renewed interest in everything, ...
So the aggregator keyed on the word "interest" -- and I note that Google's indexer did not disambiguate two, different meanings of that word. In fact, the indexer created an association despite a semantic mismatch. The aggregator wouldn't care: non-disambiguation means more hits.

That text about "interest" is not in the current version of the page but it is still in Google's cached version. The sentence continues on, "interest in everything, seems like a perfect medium for the control freaks." I didn't write that! I found, however, that one of my Intelligent Enterprise blogger colleagues, the inimitable Neil Raden, did.

The link recorded by the aggregator is to an IE page that lists a bunch of blog entries. I'd infer that the aggregator pulled in the page when my entry was newest, at the top of the page. It associated my title with text farther down the page, from a different blog entry, that carried the keyword of interest.

I tried a few more of the links listed in response to a Google search on my earlier blog title. Based on the URLs, and looking at the content, I found many more machine-aggregated pages. All had Google ads. No surprise; these pages were designed for Google-search findability, which was in turn boosted by the presence of the ads at indexing time.

Do your own tests if you wish to confirm (or disprove) the feedback effect. I'll leave you with the thought that my experience supports the views I expressed in my Roads to Semantics blog entry.

Semantic capabilities matter, but given how easy it would be to throw a bunch of RDF tags into machine-generated ad-platform pages, Semantic Web mark-up standards will not produce the Web of intent we'd like to see. The aggregators and their successors will trick your agent-bots as easily as they have exploited the search engines. It's analytics that can tell ads from the base content of a page and right the mis-ranking of spam-like aggregation pages. It's analytics, not pie-in-the-sky expectations that we'll all start publishing with RDF and OWL, that will eventually produce a Semantic Web that's worth using.


Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation and chairs the Text Analytics Summit.I've been puzzling out a technique, used by sites that machine-aggregate content, that may boost pages' Google rankings. The aggregators stuff their pages with (Google) ads and contextually similar - albeit just similar enough - content. All that pseudo-content surely moves them up the Google rankings. How else to explain the success of the bottom-feeders who exploit others' content in order to sell ads?

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.