Earlier this year, one of our online columnists, Fred Langa, wrote about an experiment he'd conducted to see how much E-mail is lost due to spam and spam filters. Read his original, thought-provoking column "Langa Letter: E-Mail--Hideously Unreliable," Jan. 12, 2004. There followed much discussion about whether the experiment itself was flawed. I'll leave it to others to argue that.
A perhaps more relevant thread deals with one category of anti-spam tool called a Bayesian filter. Such a filter selects words and numbers from E-mail text and compares their ratio between good mail and spam. Using that ratio, such a filter calculates the probability of new E-mail being spam.
In a thread titled "Bayesian the Best?," "James Becker" took issue with Langa's assertion that they're the best choice available. "They're only as good as the messages you give them. Pick the wrong messages [to analyze] and you create a false pattern. Suppose that on the morning of Feb. 1, I find 10 pieces of spam. All my good E-mail at that point had been delivered over the previous month."
Becker says a Bayesian filter might very well incorrectly deduce that E-mail with February somewhere in it is spam.
And "Bayesian filters [don't] do well with small samples," he says. "I have colleagues who complain when they get multiple pieces of spam within the same week. A Bayesian filter won't spot a strong or true pattern when the sample size is small." Actually, that sounds more like a human-patience problem to me.
So let's assume that you've given your filter the optimal variety and number of E-mail samples to analyze. You're done, right? No. There's the "evolutionary flaw" scenario.
"Spammers ... try to devise messages that don't quite look like previous messages because they know the filters are out there. Therefore, once you've given your Bayesian filter lots of examples, it has a strong idea of what spam looked like a few months ago. If recent spam content has evolved from old spam content, your Bayesian filter is behind the times. It might even have some unlearning to do before it starts to learn the new stuff."
Good advice from Becker, but he ends with the oldest cop-out in the book--that technology can't solve all of our problems. Filters just guess, so they'll always be inaccurate. "Unwelcomeness is an individual, subjective judgment on the part of the recipient." I was with you until you blasphemed, Becker. Your 15 minutes are up.
I was worried about "Cindy Harris" when she wrote, "The human brain is still the best pattern-recognition tool we know."
But she goes on to say Becker sells Bayesian filters short. "The whole point of such a filter is to feed it all of the stuff that regularly comes through your box. You feed it everything and correct every error. The more diligent you are, the more accurate your filter. At the beginning, the filter will tend to make wrong guesses, and you'll have to reclassify a lot of mail, but a Bayesian filter learns."
Harris says that even evolutionary spam changes largely can be addressed by Bayesian filters. "A new strategy will slip through the filter at first, but the more widespread it becomes, the more quickly a Bayesian filter will learn to recognize it and dump similar messages."
"Spam? What's that? Oh, right, that's the unwanted E-mail I used to get before I switched to a Mac running Panther and MacMail. Apple's filter is damn near perfect (I get well under 1% false positives or false negatives) after a couple of weeks. Click on 'junk' or 'not junk,' and its Bayesian filter is updated."
Wow, Mac folks are a plucky (if foul-mouthed) group, huh? Be plucky, but be civil in the Listening Post.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Infographic: The State of DevOps in 2017Is DevOps helping organizations reduce costs and time-to-market for software releases? What's getting in the way of DevOps adoption? Find out in this InformationWeek and Interop ITX infographic on the state of DevOps in 2017.
2017 State of IT ReportIn today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.