Top 10 Words Used In Spam
Analysis of 54,202 spam e-mails yields this list of frequently used words, only some of which can actually be pronounced.
Roaring Penguin, a company best known for its spam-fighting server software, has recently decided to publicize the most "popular" words found in the spam messages its software has trapped. The list is to be published monthly, and is being offered to interested sites, including this one. We thought publishing it would be a nice way to end the summer season.
In a statement, the company said that as a side benefit to filtering spam and amassing statistics about it, its analysts get to see exactly what spammers are hawking this month, "And that can be a lot of fun." In the month of July the company analyzed 54,202 spam and 60,968 non-spam messages and picked out words and word-pairs.
After the analysts threw away words or pairs that appeared fewer than three times, they boiled the results down to 1,125,717 words or word pairs with the associated counts of how many times each word or pair appeared in spam and non-spam messages.
According to Roaring Penguin analysts, the following words, which are listed in frequency-of-use order, have the dubious "distinction" of having been the most likely to indicate a spam e-mail in July:
Top 10 Spam Words For July
ndsfrwudG
Tadalafil
gation
ruptcy
obli
morta
remov
ffffd5
edit1
display-variation
"This list is quite a surprise!" said a Roaring Penguin's Adam Syme, "we have no idea what ndsfrwudG is. It could be a code added to a URL for tracking purposes, or it could be 'random' noise added to confuse Bayesian engines." He added that if you really want random noise, it's best to make it random and not re-use the same set of characters.
And the following words have the dubious honor of being the most likely to indicate a spam e-mail in August, 2005:
Top 10 Spam Words For August
ffffd5
Wavefrt
Cialis
Tadalafil
edit1
go-button-software
display-variation
eyebrow-upper-left-corner
B0000AZJVC
right-topnav-default-2
The first entry, ffffd5, is HTML code for a pale yellow color. It moves up from 8th place in July to 1st place in August. In June, eeeecc was the only HTML color to make it onto the list. It seems that spammers now prefer a somewhat lighter shade of yellow than before.
Wavefrt probably refers to Alias Wavefront, and probably appeared in software-selling spams. Cialis and Tadalafil are back in August. CIALIS made a strong showing in June; Tadalafil knocked it off the top 10 in July, and in August, they're fighting it out.
B0000AZJVC is either a "coupon code" added to a URL or some other constant chunk of text that appeared in a large spam run.
All the other words on the top-10 list are probably HTML or CSS style attributes; spammers must be using Web authoring software to make their e-mail pretty.
Roaring Penguin analysts also look at word pairs, which can be a far more powerful and accurate way to separate spam from non-spam. Here are the top ten spammy word pairs from August.
Top 10 Spam Word Pairs for August, 2005
Corel Draw
Flash 2004
Soft Tabs
Alias Maya
NEW TITLES
Write review
Available INSTANT
Average Customer
Email Special
Sales Rank
Symes asserted that, once again, the software vendors and the sexual-potency vendors fight it out on the word-pair list. "I suppose, he said," that their average customers don't mind being referred to as an Average Customer, as long as they regularly receive their e-mail Specials."
About the Author
You May Also Like