|Title||Selecting Labels for News Document Clusters|
|Publication Type||Conference Paper|
|Year of Publication||2007|
|Authors||M. Shaik, Krishnaprasad Thirunarayan, Trivikram Immaneni|
|Conference Name||Selecting Labels for News Document Clusters|
This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
|Full Text|| |
K. Thirunarayan, T. Immaneni, and M. Shaik, Selecting Labels for News Document Clusters, In: Proceedings of 12th International Conference on Applications of Natural Language to Information Systems (NLDB 2007), LNCS 4592, pp. 119-130, June 2007.