Title Extraction From Bodies of HTML Documents and Its Application to Web Page Retrieval
Click here to download now
Overview: This paper addresses the issue of automatically extracting titles from the bodies of HTML documents. Titles are the Names of documents and thus are very useful information for document processing. In HTML documents, authors can explicitly specify the title fields marked by

