%0 Journal Article
%J Data & Knowledge Engineering
%D 2013
%T Mining Effective Multi-Segment Sliding Window for Pathogen Incidence Rate Prediction
%A Lei Duan
%A Changjie Tang
%A Xiasong Li
%A Guozhu Dong
%A Xianming Wang
%A Jie Zuo
%A Min Jiang
%A Zhongqiao Li
%A Yongqing Zhang
%K Data Mining
%K Multi-segment sliding window
%K Pathogen incidence rate prediction
%K Time series modeling
%X Pathogen incidence rate prediction, which can be considered as time series modeling, is an important task for infectious disease incidence rate prediction and for public health. This paper investigates applying a genetic computation technique, namely GEP, for pathogen incidence rate prediction. To overcome the shortcomings of traditional sliding windows in GEP based time series modeling, the paper introduces the problem of mining effective sliding window, for discovering optimal sliding windows for building accurate prediction models. To utilize the periodical characteristic of pathogen incidence rates, a multi-segment sliding window consisting of several segments from different periodical intervals is proposed and used. Since the number of such candidate windows is still very large, a heuristic method is designed for enumerating the candidate effective multi-segment sliding windows. Moreover, methods to find the optimal sliding window and then produce a mathematical model based on that window are proposed. A performance study on real-world datasets shows that the techniques are effective and efficient for pathogen incidence rate prediction.
%B Data & Knowledge Engineering
%V 87
%P 425-444
%8 09/2013
%G eng
%R 10.1016/j.datak.2013.05.006
%0 Book Section
%D 2009
%T Maintenance of Frequent Patterns: A Survey
%A Jinyan Li
%A Limsoon Wong
%A Mengling Feng
%A Guozhu Dong
%X This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the field of frequent pattern mining. The frequent pattern main-tenance problem is summarized with a study on how the space of frequent patterns evolves in response to data updates. This chapter focuses on incremental and decremental maintenance. Four major types of maintenance algorithms are studied: Apriori-based, partition-based, prefix-tree-based, and concise-representation-based algorithms. The authors study the advantages and limitations of these algorithms from both the theoretical and experimental perspectives. Possible solutions to certain limitations are also proposed. In addition, some potential research opportunities and emerging trends in frequent pat-tern maintenance are also discussed.
%G eng
%0 Book Section
%D 2009
%T Mining Conditional Contrast Patterns
%A Guozhu Dong
%A Guimei Liu
%A Limsoon Wong
%A Jinyan Li
%X This chapter considers the problem of 'conditional contrast pattern mining.' It is related to contrast mining, where one considers the mining of patterns/models that contrast two or more datasets, classes, conditions, time periods, and so forth. Roughly speaking, conditional contrasts capture situations where a small change in patterns is associated with a big change in the matching data of the patterns. More precisely, a conditional contrast is a triple (B, F_{1}, F_{2}) of three patterns; B is the condition/context pattern of the conditional contrast, and F_{1} and F_{2} are the contrasting factors of the conditional contrast. Such a conditional contrast is of interest if the difference between F_{1} and F_{2} as itemsets is relatively small, and the difference between the corresponding matching dataset of B∪F_{1} and that of B∪F_{2 is relatively large. It offers insights on 'discriminating' patterns for a given condition B. Conditional contrast mining is related to frequent pattern mining and analysis in general, and to the mining and analysis of closed pattern and minimal generators in particular. It can also be viewed as a new direction for the analysis (and mining) of frequent patterns. After formalizing the concepts of conditional contrast, the chapter will provide some theoretical results on conditional contrast mining. These results (i) relate conditional contrasts with closed patterns and their minimal generators, (ii) provide a concise representation for conditional contrasts, and (iii) establish a so-called dominance-beam property. An efficient algorithm will be proposed based on these results, and experiment results will be reported. Related works will also be discussed.
%G eng
%0 Journal Article
%D 2009
%T Mining Disease State Converters for Medical Intervention of Diseases.
%A Changjie Tang
%A Lei Duan
%A Guozhu Dong
%K Class membership conversion
%K Classification
%K Contrast mining
%K Disease state conversion
%K Drug design
%X In applications such as gene therapy and drug design, a key goal is to convert the disease state of diseased objects from an undesirable state into a desirable one. Such conversions may be achieved by changing the values of some attributes of the objects. For example, in gene therapy one may convert cancerous cells to normal ones by changing some genes' expression level from low to high or from high to low. In this paper, we define the disease state conversion problem as the discovery of disease state converters; a disease state converter is a small set of attribute value changes that may change an object's disease state from undesirable into desirable. We consider two variants of this problem: personalized disease state converter mining mines disease state converters for a given individual patient with a given disease, and universal disease state converter mining mines disease state converters for all samples with a given disease. We propose a DSCMiner algorithm to discover small and highly effective disease state converters. Since real-life medical experiments on living diseased instances are expensive and time consuming, we use classifiers trained from the datasets of given diseases to evaluate the quality of discovered converter sets. The effectiveness of a disease state converter is measured by the percentage of objects that are successfully converted from undesirable state into desirable state as deemed by state-of-the-art classifiers. We use experiments to evaluate the effectiveness of our algorithm and to show its effectiveness. We also discuss possible research directions for extensions and improvements. We note that the disease state conversion problem also has applications in customer retention, criminal rehabilitation, and company turn-around, where the goal is to convert class membership of objects whose class is an undesirable class.
%G eng
%0 Conference Paper
%B Mining Sequence Classifiers for Early Prediction
%D 2008
%T Mining Sequence Classifiers for Early Prediction
%A Guozhu Dong
%A Zhengzheng Xing
%A Philip Yu
%A Jian Pei
%B Mining Sequence Classifiers for Early Prediction
%G eng
%0 Journal Article
%D 2007
%T Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
%A Guozhu Dong
%A Xiaonan Ji
%A James Bailey
%G eng
%0 Conference Paper
%D 2006
%T Masquerader Detection Using OCLEP: One-Class Classification Using Length Statistics of Emerging Patterns
%A Lijun Chen
%A Guozhu Dong
%I International Workshop on INformation Processing over Evolving Networks (WINPEN)
%G eng
%0 Conference Paper
%B Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
%D 2006
%T Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
%A Limsoon Wong
%A Jinyan Li
%A Guozhu Dong
%A H. Li
%A Jian Pei
%B Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
%G eng
%0 Conference Paper
%B Multi-Dimensional Regression Analysis of Time-Series Data Streams
%D 2002
%T Multi-Dimensional Regression Analysis of Time-Series Data Streams
%A Jianyong Wang
%A Jiawei Han
%A Guozhu Dong
%A Benjamin Wah
%A Yixin Chen
%B Multi-Dimensional Regression Analysis of Time-Series Data Streams
%G eng
%0 Conference Paper
%B MultiDimensional Regression Analysis of Time-Series Data Streams.
%D 2002
%T MultiDimensional Regression Analysis of Time-Series Data Streams.
%A Jian Pei
%A Jianyong Wang
%A Benjamin Wah
%A Jiawei Han
%A Wei Zou
%A Guozhu Dong
%B MultiDimensional Regression Analysis of Time-Series Data Streams.
%G eng
%0 Journal Article
%D 2001
%T Making Use of the Most Expressive Jumping Emerging Patterns for Classification
%A Kotagiri Ramamohanarao
%A Guozhu Dong
%A Jinyan Li
%G eng
%0 Journal Article
%D 2001
%T Mining Multi-Dimensional Constrained Gradients in Data Cubes.
%A Joyce Lam
%A Ke Wang
%A Jian Pei
%A Guozhu Dong
%A Jiawei Han
%G eng
%0 Journal Article
%D 2000
%T Making Use of the Most Expressive Jumping Emerging Patterns for Classification
%A Guozhu Dong
%A Jinyan Li
%A Kotagiri Ramamohanarao
%G eng
%0 Journal Article
%D 1999
%T Maintaining Transitive Closure of Graphs in SQL
%A Leonid Libkin
%A Jianwen Su
%A Guozhu Dong
%A Limsoon Wong
%G eng
%0 Journal Article
%D 1997
%T Maintaining constrained transitive closure by conjunctive queries
%A Ramamohanarao Kotagiri
%A Guozhu Dong
%G eng
%0 Journal Article
%D 1993
%T On the monotonicity of (LDL) logic programs with sets
%A Guozhu Dong
%G eng
}