|Title||SCALE: a Scalable Framework for Efficiently Clustering Large Transactional Data|
|Publication Type||Journal Article|
|Year of Publication||2009|
|Authors||Keke Chen, Hua Yan, Ling Liu|
|Journal||Journal of Data Mining and Knowledge Discovery (DMKD)|
This paper presents SCALE, a fully automated transactional clustering framework. The SCALE designhighlights three unique features. First, we introduce the concept of Weighted Coverage Density as acategorical similarity measure for efficient clustering of transactional datasets. The concept of weightedcoverage density is intuitive and it allows the weight of each item in a cluster to be changed dynamicallyaccording to the occurrences of items. Second, we develop the weighted coverage density measure basedclustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactionaldata. Third, we introduce two clustering validation metrics and show that these domain specific clusteringevaluation metrics are critical to capture the transactional semantics in clustering analysis. Our SCALEframework combines the weighted coverage density measure for clustering over a sample dataset with selfconfiguringmethods. These self-configuring methods can automatically tune the two important parametersof our clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the applicationof two domain-specific cluster validity measures to find the best result from the set of clustering results.
|Full Text|| |
Hua Yan, Keke Chen and Ling Liu, 'SCALE: a Scalable Framework for Efficiently Clustering Large Transactional Data,'in Journal of Data Mining and Knowledge Discovery (DMKD), 19(4), 2009.