Best Of Listening Post - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

11:50 AM

Best Of Listening Post

Filter Favorite
Earlier this year, one of our online columnists, Fred Langa, wrote about an experiment he'd conducted to see how much E-mail is lost due to spam and spam filters. Read his original, thought-provoking column "Langa Letter: E-Mail--Hideously Unreliable," Jan. 12, 2004. There followed much discussion about whether the experiment itself was flawed. I'll leave it to others to argue that.

A perhaps more relevant thread deals with one category of anti-spam tool called a Bayesian filter. Such a filter selects words and numbers from E-mail text and compares their ratio between good mail and spam. Using that ratio, such a filter calculates the probability of new E-mail being spam.

In a thread titled "Bayesian the Best?," "James Becker" took issue with Langa's assertion that they're the best choice available. "They're only as good as the messages you give them. Pick the wrong messages [to analyze] and you create a false pattern. Suppose that on the morning of Feb. 1, I find 10 pieces of spam. All my good E-mail at that point had been delivered over the previous month."

Becker says a Bayesian filter might very well incorrectly deduce that E-mail with February somewhere in it is spam.

And "Bayesian filters [don't] do well with small samples," he says. "I have colleagues who complain when they get multiple pieces of spam within the same week. A Bayesian filter won't spot a strong or true pattern when the sample size is small." Actually, that sounds more like a human-patience problem to me.

So let's assume that you've given your filter the optimal variety and number of E-mail samples to analyze. You're done, right? No. There's the "evolutionary flaw" scenario.

"Spammers ... try to devise messages that don't quite look like previous messages because they know the filters are out there. Therefore, once you've given your Bayesian filter lots of examples, it has a strong idea of what spam looked like a few months ago. If recent spam content has evolved from old spam content, your Bayesian filter is behind the times. It might even have some unlearning to do before it starts to learn the new stuff."

Good advice from Becker, but he ends with the oldest cop-out in the book--that technology can't solve all of our problems. Filters just guess, so they'll always be inaccurate. "Unwelcomeness is an individual, subjective judgment on the part of the recipient." I was with you until you blasphemed, Becker. Your 15 minutes are up.

I was worried about "Cindy Harris" when she wrote, "The human brain is still the best pattern-recognition tool we know."

But she goes on to say Becker sells Bayesian filters short. "The whole point of such a filter is to feed it all of the stuff that regularly comes through your box. You feed it everything and correct every error. The more diligent you are, the more accurate your filter. At the beginning, the filter will tend to make wrong guesses, and you'll have to reclassify a lot of mail, but a Bayesian filter learns."

Harris says that even evolutionary spam changes largely can be addressed by Bayesian filters. "A new strategy will slip through the filter at first, but the more widespread it becomes, the more quickly a Bayesian filter will learn to recognize it and dump similar messages."

And, finally, a taunt from "Avo Five."

"Spam? What's that? Oh, right, that's the unwanted E-mail I used to get before I switched to a Mac running Panther and MacMail. Apple's filter is damn near perfect (I get well under 1% false positives or false negatives) after a couple of weeks. Click on 'junk' or 'not junk,' and its Bayesian filter is updated."

Wow, Mac folks are a plucky (if foul-mouthed) group, huh? Be plucky, but be civil in the Listening Post.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
How GIS Data Can Help Fix Vaccine Distribution
Jessica Davis, Senior Editor, Enterprise Apps,  2/17/2021
Graph-Based AI Enters the Enterprise Mainstream
James Kobielus, Tech Analyst, Consultant and Author,  2/16/2021
11 Ways DevOps Is Evolving
Lisa Morgan, Freelance Writer,  2/18/2021
Register for InformationWeek Newsletters
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
White Papers
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Sponsored Video
Flash Poll