Spamming in Other Languages Could Avoid Detection

Why would a commercial spammer want to send you messages in a language you can't read? You can be sure it's not to see if you sprechen sie Deutch?
Now it seems you need an international dictionary handy to determine if your e-mail should be classified as spam. Last month, a group of neo-Nazis peppered the Web with the so-called "German spam flux," an e-mail worm that may have inadvertently set the stage for future attacks from other spammers.

Here's how it worked: An Internet worm spewed German-language e-mails across the Web, most of them spouting neo-Nazi diatribes and pointing to similarly themed Web sites--and all that data sailed right past many of our carefully-crafted English-language spam filters.

Spam is typically filtered by source or content. One of the most popular content-based strategies is Bayesian filtering, in which the antispam software "learns" words that flag potential spam in a specific organization. For example, the word mortgage may be spam in most enterprises, but not at a mortgage brokerage. If you've used a Bayesian filter, such as SpamBayes, you probably spent a week or two telling it which messages were spam, while it assigned statistical weights to the words in those messages.

But if the antispam software hasn't seen a word yet, the incoming message can't be weighted properly. That's why spammers insert nonsense words in their messages, hoping the filter will misclassify them. This seldom works, since the English-language content is usually enough to trigger the filter.

In the case of the German spam flux, nothing in the messages had been classified by English Bayesian filters, so the messages went right through. And such filters aren't capable of simply trashing all non-English messages, as a human user could. The lesson for spammers: Bayesian filters can be easily beaten by using any language that's not generally used in the target geography.

Why would a commercial spammer want to send messages to users in a language they can't read? Some spammers might use this approach to send out "tagged" messages that validate e-mail addresses. Lots of spam messages contain pointers that retrieve a graphic from a Web site--often just one pixel that users never see--to verify that the message actually reached someone. Other spammers might use the language ruse to distribute messages that contain an automatic link to a Web site where the real ad is displayed. Non-English messages could also be used to deliver other links or attachments that contain worms, viruses or Trojan horses.

Make sure your users understand that messages they can't read may be even more dangerous than those they can. Keep your Bayesian filters up to date, and batten down the hatches: Last month's German messages may be in French--or Tagalog--tomorrow.

Editor's Choice
Brian T. Horowitz, Contributing Reporter
Samuel Greengard, Contributing Reporter
Nathan Eddy, Freelance Writer
Brandon Taylor, Digital Editorial Program Manager
Jessica Davis, Senior Editor
Cynthia Harvey, Freelance Journalist, InformationWeek
Sara Peters, Editor-in-Chief, InformationWeek / Network Computing