Spam blogs, known as splogs, are invading the Web by the millions. Blog search engines are trying to stamp them out, but more work needs to be done.
A study in December by the eBiquity Research Group at the University of Maryland, Baltimore County, found that an even higher portion of pings are really spings--nearly 75%. EBiquity also discovered that more than half of the blogs pinging Weblogs.com's server are spam.
Finin, who helps run eBiquity at the university, says Technorati is as good as any search engine at picking out splogs, but that about one out of every five new blog posts Technorati indexes is fake.
One of sploggers' newer strategies is to plagiarize material from other online sources. Then they insert a generic sentence that points to the site they're promoting. "It's not easy for even a human to tell" if the blog is real, Finin says. "It takes a minute or two."
Finin recently noticed, through Technorati, that a blog had copied content he had written about the OWL programming language. At the site, he found links to other stories that had to do with owls--not only the winged creatures, but the Temple University basketball team, a bar in Baltimore, and a street in Houston.
"The person who set this up also set up hundreds of others, focused on different keywords or phrases," he says. Finin believes the site is a "splog farm" that may look legitimate now but eventually will carry ads and links to target sites.
Part of the problem is that blog search is still in its infancy and the companies doing it are small, unlike the huge companies that dominate Web search and have teams of people dedicated to researching data quality.
On top of that, results for Web search engines are ranked by relevance, which means splog sites generally don't show up on the first few pages of results. But blog search engines rank results differently. "On blog search, what people are interested in seeing is not the most relevant but the most recent," Glance says. "If there's a spam attack on a particular topic on a given day, it will be on the first page unless we filter them out."
Just like with E-mail spam, it's likely splogs will never be eliminated. The hope is they can be suppressed to the point that they won't ruin the Web experience.
5 Top Federal Initiatives For 2015As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.