Welcome Guest. | Log In| Register | Membership Benefits

  • Email this page E-mail
  • |  Print Print
  • |   Bookmark and Share
  • icon

Langa Letter: Real-Life Spam Solutions


Langa Letter: Real-Life Spam Solutions



(Page 3 of 4)

Stopgap Measures
The first applications of Bayesian filtering are just now starting to appear (we'll discuss specific examples later in this article), but they're too new to be relied upon in full-scale production environments. In the meantime, here are some simpler tools that are known to work pretty well:

SpamAssassin, like Bayesian analysis, tries to assess both the good and bad elements of an E-mail to arrive at a more holistic view of the content, but instead of using probabilistic analysis, SpamAssassin simply assigns a "score" to the E-mail. Certain spamlike words, phrases, and features of an E-mail increase that E-mail's score; corresponding nonspam elements can decrease the score. In this way, the bad and good elements of any given E-mail exert a kind of tug-of-war on the E-mail's final score; if, and only if, the final score exceeds some user-definable threshold will the E-mail be tagged as spam. This helps to reduce false-positives, and allows innocent E-mails that happen to contain a few spamlike elements to pass through.

SpamAssassin works at either the server or client level--I use a SpamAssassin-based plug-in on my personal copy of Eudora, for example. But SpamAssassin's scoring isn't self-adaptive and is instead based on arcanely-coded rigid "rules" that can be quite difficult to interpret and alter. For example, to look for a word in all caps repeated within one line of text--an attention-getting device used in many spam mails--SpamAssassin uses the code: "/\b([A-Z]{3,})\b.{0,50}\b\1\b/" If you have a little programming experience you can probably puzzle out what's going on in that code snippet, but it's far from obvious or user-friendly. In my case, I found I manually had to add, and then fiddle with, a number of custom rules and scoring variables before SpamAssassin gave me results that were acceptable.

SpamNet from Cloudmark tries a different approach, using the collective intelligence of its users through Napster-like peer-to-peer networking. But instead of sharing music files, users share information about what is--and what is not--spam. If most users agree that something in their inbox is spam, it will be flagged as such. But if most users agree that something is NOT spam, it won't be listed. In other words, false spam reports can be "outvoted" by positive reports from other users. (I'm oversimplifying, but you get the idea.) It's a noteworthy approach that places control over what is and is not defined as spam in the hands of the total body of recipients instead of a small group of self-appointed blacklisters and censors. Nonspammers won't get blocked, but spammers definitely will. In fact, even the most subtle, clever spammers will be caught because, as the saying goes, "you can't fool all of the people all of the time."

The problem with Cloudmark is that people are lazy: Users may flag nonspam items as spam, simply because it easy to do so. For example, Cloudmark has recently had a problem with some users tagging Symantec's by-subscription security bulletins as spam, even though the users had at one time asked to receive these. When enough users take the lazy way out--tagging a by-subscription item as spam instead of unsubscribing--Cloudmark then blocks that E-mail for all users, including those who know that it's not spam and who do want to see it. (This flaw actually plagues all user-driven anti-spam tools, such as the SpamCop blacklist: A small number of lazy users may enter false spam reports that block valid E-mails for the majority of users.) Eventually, these kinds of perturbations get sorted out, but it's a time-wasting hassle. Unfortunately, because there are a lot of lazy people in the world who would rather hit a "block" button than take the time to unsubscribe from a non-spam E-mail service they once specifically requested, this is a problem that won't go away.

Brightmail, mentioned earlier, has a good track record with its server-level tools. Its "mailwall" techniques are proprietary and explained only in general terms, but give an indication that it's more than simple rule-based filtering: "[It's] an analysis and filtering system, analogous to a firewall, that protects the integrity and security of electronic mail systems, and protects individual users from E-mail-borne threats. These threats can include virus invasions, spam attacks, and other content-related risks. A mailwall solution uses filters that are based on human and/or machine analysis to determine if E-mail messages should be routed normally, sidelined, or modified." Alas, Brightmail isn't cheap, and there is no client-side version available.


Page 4:  Langa Letter: Real-Life Spam Solutions
« Previous Page | 1 | 2 | 3 | 4 Next Page »


Subscribe to RSS


Advertisement






Get InformationWeek in Print

Apply for a free 52-week subscription to InformationWeek (a $199 value)



NOTE: Offer valid for U.S., U.S. possessions, & Canada only.