Earlier this year, one of our online columnists, Fred Langa, wrote about an experiment he'd conducted to see how much E-mail is lost due to spam and spam filters. Read his original, thought-provoking column "Langa Letter: E-Mail--Hideously Unreliable," Jan. 12, 2004. There followed much discussion about whether the experiment itself was flawed. I'll leave it to others to argue that.
A perhaps more relevant thread deals with one category of anti-spam tool called a Bayesian filter. Such a filter selects words and numbers from E-mail text and compares their ratio between good mail and spam. Using that ratio, such a filter calculates the probability of new E-mail being spam.
In a thread titled "Bayesian the Best?," "James Becker" took issue with Langa's assertion that they're the best choice available. "They're only as good as the messages you give them. Pick the wrong messages [to analyze] and you create a false pattern. Suppose that on the morning of Feb. 1, I find 10 pieces of spam. All my good E-mail at that point had been delivered over the previous month."
Becker says a Bayesian filter might very well incorrectly deduce that E-mail with February somewhere in it is spam.
And "Bayesian filters [don't] do well with small samples," he says. "I have colleagues who complain when they get multiple pieces of spam within the same week. A Bayesian filter won't spot a strong or true pattern when the sample size is small." Actually, that sounds more like a human-patience problem to me.
So let's assume that you've given your filter the optimal variety and number of E-mail samples to analyze. You're done, right? No. There's the "evolutionary flaw" scenario.
"Spammers ... try to devise messages that don't quite look like previous messages because they know the filters are out there. Therefore, once you've given your Bayesian filter lots of examples, it has a strong idea of what spam looked like a few months ago. If recent spam content has evolved from old spam content, your Bayesian filter is behind the times. It might even have some unlearning to do before it starts to learn the new stuff."
Good advice from Becker, but he ends with the oldest cop-out in the book--that technology can't solve all of our problems. Filters just guess, so they'll always be inaccurate. "Unwelcomeness is an individual, subjective judgment on the part of the recipient." I was with you until you blasphemed, Becker. Your 15 minutes are up.
I was worried about "Cindy Harris" when she wrote, "The human brain is still the best pattern-recognition tool we know."
But she goes on to say Becker sells Bayesian filters short. "The whole point of such a filter is to feed it all of the stuff that regularly comes through your box. You feed it everything and correct every error. The more diligent you are, the more accurate your filter. At the beginning, the filter will tend to make wrong guesses, and you'll have to reclassify a lot of mail, but a Bayesian filter learns."
Harris says that even evolutionary spam changes largely can be addressed by Bayesian filters. "A new strategy will slip through the filter at first, but the more widespread it becomes, the more quickly a Bayesian filter will learn to recognize it and dump similar messages."
"Spam? What's that? Oh, right, that's the unwanted E-mail I used to get before I switched to a Mac running Panther and MacMail. Apple's filter is damn near perfect (I get well under 1% false positives or false negatives) after a couple of weeks. Click on 'junk' or 'not junk,' and its Bayesian filter is updated."
Wow, Mac folks are a plucky (if foul-mouthed) group, huh? Be plucky, but be civil in the Listening Post.
IT's Reputation: What the Data SaysInformationWeek's IT Perception Survey seeks to quantify how IT thinks it's doing versus how the business really views IT's performance in delivering services - and, more important, powering innovation. Our results suggest IT leaders should worry less about whether they're getting enough resources and more about the relationships they have with business unit peers.
What The Business Really Thinks Of IT: 3 Hard TruthsThey say perception is reality. If so, many in-house IT departments have reason to worry. InformationWeek's IT Perception Survey seeks to quantify how IT thinks it's doing versus how the business views IT's performance in delivering services - and, more important, powering innovation. The news isn't great.
InformationWeek Must Reads Oct. 21, 2014InformationWeek's new Must Reads is a compendium of our best recent coverage of digital strategy. Learn why you should learn to embrace DevOps, how to avoid roadblocks for digital projects, what the five steps to API management are, and more.