Top 10 Words Used In Spam

Analysis of 54,202 spam e-mails yields this list of frequently used words, only some of which can actually be pronounced.

InformationWeek Staff, Contributor

September 2, 2005

3 Min Read

Roaring Penguin, a company best known for its spam-fighting server software, has recently decided to publicize the most "popular" words found in the spam messages its software has trapped. The list is to be published monthly, and is being offered to interested sites, including this one. We thought publishing it would be a nice way to end the summer season.

In a statement, the company said that as a side benefit to filtering spam and amassing statistics about it, its analysts get to see exactly what spammers are hawking this month, "And that can be a lot of fun." In the month of July the company analyzed 54,202 spam and 60,968 non-spam messages and picked out words and word-pairs.

After the analysts threw away words or pairs that appeared fewer than three times, they boiled the results down to 1,125,717 words or word pairs with the associated counts of how many times each word or pair appeared in spam and non-spam messages.

According to Roaring Penguin analysts, the following words, which are listed in frequency-of-use order, have the dubious "distinction" of having been the most likely to indicate a spam e-mail in July:

Top 10 Spam Words For July

  1. ndsfrwudG

  2. Tadalafil

  3. gation

  4. ruptcy

  5. obli

  6. morta

  7. remov

  8. ffffd5

  9. edit1

  10. display-variation

"This list is quite a surprise!" said a Roaring Penguin's Adam Syme, "we have no idea what ndsfrwudG is. It could be a code added to a URL for tracking purposes, or it could be 'random' noise added to confuse Bayesian engines." He added that if you really want random noise, it's best to make it random and not re-use the same set of characters.

And the following words have the dubious honor of being the most likely to indicate a spam e-mail in August, 2005:

Top 10 Spam Words For August

  1. ffffd5

  2. Wavefrt

  3. Cialis

  4. Tadalafil

  5. edit1

  6. go-button-software

  7. display-variation

  8. eyebrow-upper-left-corner

  9. B0000AZJVC

  10. right-topnav-default-2

The first entry, ffffd5, is HTML code for a pale yellow color. It moves up from 8th place in July to 1st place in August. In June, eeeecc was the only HTML color to make it onto the list. It seems that spammers now prefer a somewhat lighter shade of yellow than before.

Wavefrt probably refers to Alias Wavefront, and probably appeared in software-selling spams. Cialis and Tadalafil are back in August. CIALIS made a strong showing in June; Tadalafil knocked it off the top 10 in July, and in August, they're fighting it out.

B0000AZJVC is either a "coupon code" added to a URL or some other constant chunk of text that appeared in a large spam run.

All the other words on the top-10 list are probably HTML or CSS style attributes; spammers must be using Web authoring software to make their e-mail pretty.

Roaring Penguin analysts also look at word pairs, which can be a far more powerful and accurate way to separate spam from non-spam. Here are the top ten spammy word pairs from August.

Top 10 Spam Word Pairs for August, 2005

  1. Corel Draw

  2. Flash 2004

  3. Soft Tabs

  4. Alias Maya

  5. NEW TITLES

  6. Write review

  7. Available INSTANT

  8. Average Customer

  9. Email Special

  10. Sales Rank

Symes asserted that, once again, the software vendors and the sexual-potency vendors fight it out on the word-pair list. "I suppose, he said," that their average customers don't mind being referred to as an Average Customer, as long as they regularly receive their e-mail Specials."

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights