Advertisers Corrupt Search, Microsoft Research Finds
A study with help from UC Davis suggests advertisers should closely scrutinize syndicators and traffic affiliates that are profiting from spam traffic.
The very ad dollars that support so many free sites on the Internet pay to make your search experience worse.
In a research paper to be presented at the 16th International World Wide Web Conference in Banff, Alberta, in May, researchers from Microsoft and the University of California, Davis conclude that "it is advertisers' money that is funding the search spam industry, which is increasingly cluttering the Web with low-quality content and reducing Web users' productivity."
Search spamming is the practice of using hidden text, doorway pages, or other means of deception like comment spam to make low-quality Web content rank high in a search results list.
The paper's authors, Yi-Min Wang and Ming Ma of Microsoft Research and Yuan Niu and Hao Chen of UC Davis, hope "to educate users not to click spam links and spam ads, and to encourage advertisers to scrutinize those syndicators and traffic affiliates who are profiting from spam traffic at the expense of the long-term health of the Web."
The paper, "Spam Double-Funnel: Connecting Web Spammers With Advertisers," details the tactics and methods of search spammers. And authors' findings reveal just how prevalent search spamming has become.
The 10 search keywords most frequently targeted by spammers, according to the study, are drugs, adult, gambling, ring tones, money, accessories, travel, cars, music, and furniture.
The two most spam-rich categories of content are Web pages that have to do with drugs (30.8% spam) and ring tones (27.5% spam). The average amount of unwanted content across the categories defined by the 10 most spammed keywords is 11.6%.
The report notes that few syndicators serve as middlemen, connecting the majority of Web spammers to advertisers. The top three -- Findwhat.com, Looksmart.com, and 7search.com -- were involved in 59% to 68% of the click-through redirection chains for the spam-ads sampled.
Based on the data analyzed, the report's authors conclude, "[T]hese syndicators appear to be involved in the search spam industry both broadly and deeply."
The study also found that 68% of the unique URLs in the .info domain were spam. The prevalence of spam pages in other domains is follows: 4.1% in .com; 11% in .org; 12% in .net; and 53% in .biz.
Free blog-hosting sites also contribute to the problem. For example, about one in every four spam pages listed in the search results list was hosted by Blogspot.com, according to the study.
The paper offers some hope, however. It seems two large blocks of IP addresses -- 126.96.36.199 to 188.8.131.52 and 184.108.40.206 to 220.127.116.11 -- account for a substantial portion of spam ad click-through traffic. This "bottleneck," the paper notes, "may be the best layer for attacking the search spam problem."
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Top IT Trends to Watch in Financial ServicesIT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Join us for a roundup of the top stories on InformationWeek.com for the week of September 18, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."