Email Data Cleaning
Click here to download now
Overview: Addressed in this white paper is the issue of Email data cleaning for text mining. Email is one of the commonest means for communication via text. Several products offer email cleaning features, however, the types of noises that can be eliminated are restricted. In this paper, email cleaning is formalized as a problem of non-text filtering and text normalization. In this way, email cleaning becomes independent from any specific text mining processing. A cascaded approach is proposed, which cleans up an email in four passes including non-text filtering, paragraph normalization, sentence normalization, and word normalization.

