TY - JOUR
T1 - Mining Effective Multi-Segment Sliding Window for Pathogen Incidence Rate Prediction
JF - Data & Knowledge Engineering
Y1 - 2013
A1 - Lei Duan
A1 - Changjie Tang
A1 - Xiasong Li
A1 - Guozhu Dong
A1 - Xianming Wang
A1 - Jie Zuo
A1 - Min Jiang
A1 - Zhongqiao Li
A1 - Yongqing Zhang
KW - Data Mining
KW - Multi-segment sliding window
KW - Pathogen incidence rate prediction
KW - Time series modeling
AB - Pathogen incidence rate prediction, which can be considered as time series modeling, is an important task for infectious disease incidence rate prediction and for public health. This paper investigates applying a genetic computation technique, namely GEP, for pathogen incidence rate prediction. To overcome the shortcomings of traditional sliding windows in GEP based time series modeling, the paper introduces the problem of mining effective sliding window, for discovering optimal sliding windows for building accurate prediction models. To utilize the periodical characteristic of pathogen incidence rates, a multi-segment sliding window consisting of several segments from different periodical intervals is proposed and used. Since the number of such candidate windows is still very large, a heuristic method is designed for enumerating the candidate effective multi-segment sliding windows. Moreover, methods to find the optimal sliding window and then produce a mathematical model based on that window are proposed. A performance study on real-world datasets shows that the techniques are effective and efficient for pathogen incidence rate prediction.
VL - 87
ER -
TY - CHAP
T1 - Maintenance of Frequent Patterns: A Survey
Y1 - 2009
A1 - Jinyan Li
A1 - Limsoon Wong
A1 - Mengling Feng
A1 - Guozhu Dong
AB - This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the field of frequent pattern mining. The frequent pattern main-tenance problem is summarized with a study on how the space of frequent patterns evolves in response to data updates. This chapter focuses on incremental and decremental maintenance. Four major types of maintenance algorithms are studied: Apriori-based, partition-based, prefix-tree-based, and concise-representation-based algorithms. The authors study the advantages and limitations of these algorithms from both the theoretical and experimental perspectives. Possible solutions to certain limitations are also proposed. In addition, some potential research opportunities and emerging trends in frequent pat-tern maintenance are also discussed.
ER -
TY - CHAP
T1 - Mining Conditional Contrast Patterns
Y1 - 2009
A1 - Guozhu Dong
A1 - Guimei Liu
A1 - Limsoon Wong
A1 - Jinyan Li
AB - This chapter considers the problem of 'conditional contrast pattern mining.' It is related to contrast mining, where one considers the mining of patterns/models that contrast two or more datasets, classes, conditions, time periods, and so forth. Roughly speaking, conditional contrasts capture situations where a small change in patterns is associated with a big change in the matching data of the patterns. More precisely, a conditional contrast is a triple (B, F_{1}, F_{2}) of three patterns; B is the condition/context pattern of the conditional contrast, and F_{1} and F_{2} are the contrasting factors of the conditional contrast. Such a conditional contrast is of interest if the difference between F_{1} and F_{2} as itemsets is relatively small, and the difference between the corresponding matching dataset of B∪F_{1} and that of B∪F_{2 is relatively large. It offers insights on 'discriminating' patterns for a given condition B. Conditional contrast mining is related to frequent pattern mining and analysis in general, and to the mining and analysis of closed pattern and minimal generators in particular. It can also be viewed as a new direction for the analysis (and mining) of frequent patterns. After formalizing the concepts of conditional contrast, the chapter will provide some theoretical results on conditional contrast mining. These results (i) relate conditional contrasts with closed patterns and their minimal generators, (ii) provide a concise representation for conditional contrasts, and (iii) establish a so-called dominance-beam property. An efficient algorithm will be proposed based on these results, and experiment results will be reported. Related works will also be discussed.
ER -
TY - JOUR
T1 - Mining Disease State Converters for Medical Intervention of Diseases.
Y1 - 2009
A1 - Changjie Tang
A1 - Lei Duan
A1 - Guozhu Dong
KW - Class membership conversion
KW - Classification
KW - Contrast mining
KW - Disease state conversion
KW - Drug design
AB - In applications such as gene therapy and drug design, a key goal is to convert the disease state of diseased objects from an undesirable state into a desirable one. Such conversions may be achieved by changing the values of some attributes of the objects. For example, in gene therapy one may convert cancerous cells to normal ones by changing some genes' expression level from low to high or from high to low. In this paper, we define the disease state conversion problem as the discovery of disease state converters; a disease state converter is a small set of attribute value changes that may change an object's disease state from undesirable into desirable. We consider two variants of this problem: personalized disease state converter mining mines disease state converters for a given individual patient with a given disease, and universal disease state converter mining mines disease state converters for all samples with a given disease. We propose a DSCMiner algorithm to discover small and highly effective disease state converters. Since real-life medical experiments on living diseased instances are expensive and time consuming, we use classifiers trained from the datasets of given diseases to evaluate the quality of discovered converter sets. The effectiveness of a disease state converter is measured by the percentage of objects that are successfully converted from undesirable state into desirable state as deemed by state-of-the-art classifiers. We use experiments to evaluate the effectiveness of our algorithm and to show its effectiveness. We also discuss possible research directions for extensions and improvements. We note that the disease state conversion problem also has applications in customer retention, criminal rehabilitation, and company turn-around, where the goal is to convert class membership of objects whose class is an undesirable class.
ER -
TY - CONF
T1 - Mining Sequence Classifiers for Early Prediction
T2 - Mining Sequence Classifiers for Early Prediction
Y1 - 2008
A1 - Guozhu Dong
A1 - Zhengzheng Xing
A1 - Philip Yu
A1 - Jian Pei
JA - Mining Sequence Classifiers for Early Prediction
ER -
TY - JOUR
T1 - Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
Y1 - 2007
A1 - Guozhu Dong
A1 - Xiaonan Ji
A1 - James Bailey
ER -
TY - CONF
T1 - Masquerader Detection Using OCLEP: One-Class Classification Using Length Statistics of Emerging Patterns
Y1 - 2006
A1 - Lijun Chen
A1 - Guozhu Dong
PB - International Workshop on INformation Processing over Evolving Networks (WINPEN)
ER -
TY - CONF
T1 - Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
T2 - Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
Y1 - 2006
A1 - Limsoon Wong
A1 - Jinyan Li
A1 - Guozhu Dong
A1 - H. Li
A1 - Jian Pei
JA - Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns
ER -
TY - CONF
T1 - Multi-Dimensional Regression Analysis of Time-Series Data Streams
T2 - Multi-Dimensional Regression Analysis of Time-Series Data Streams
Y1 - 2002
A1 - Jianyong Wang
A1 - Jiawei Han
A1 - Guozhu Dong
A1 - Benjamin Wah
A1 - Yixin Chen
JA - Multi-Dimensional Regression Analysis of Time-Series Data Streams
ER -
TY - CONF
T1 - MultiDimensional Regression Analysis of Time-Series Data Streams.
T2 - MultiDimensional Regression Analysis of Time-Series Data Streams.
Y1 - 2002
A1 - Jian Pei
A1 - Jianyong Wang
A1 - Benjamin Wah
A1 - Jiawei Han
A1 - Wei Zou
A1 - Guozhu Dong
JA - MultiDimensional Regression Analysis of Time-Series Data Streams.
ER -
TY - JOUR
T1 - Making Use of the Most Expressive Jumping Emerging Patterns for Classification
Y1 - 2001
A1 - Kotagiri Ramamohanarao
A1 - Guozhu Dong
A1 - Jinyan Li
ER -
TY - JOUR
T1 - Mining Multi-Dimensional Constrained Gradients in Data Cubes.
Y1 - 2001
A1 - Joyce Lam
A1 - Ke Wang
A1 - Jian Pei
A1 - Guozhu Dong
A1 - Jiawei Han
ER -
TY - JOUR
T1 - Making Use of the Most Expressive Jumping Emerging Patterns for Classification
Y1 - 2000
A1 - Guozhu Dong
A1 - Jinyan Li
A1 - Kotagiri Ramamohanarao
ER -
TY - JOUR
T1 - Maintaining Transitive Closure of Graphs in SQL
Y1 - 1999
A1 - Leonid Libkin
A1 - Jianwen Su
A1 - Guozhu Dong
A1 - Limsoon Wong
ER -
TY - JOUR
T1 - Maintaining constrained transitive closure by conjunctive queries
Y1 - 1997
A1 - Ramamohanarao Kotagiri
A1 - Guozhu Dong
ER -
TY - JOUR
T1 - On the monotonicity of (LDL) logic programs with sets
Y1 - 1993
A1 - Guozhu Dong
ER -
}