Automatic Domain Model Creation Using Pattern-Based Fact Extraction

TitleAutomatic Domain Model Creation Using Pattern-Based Fact Extraction
Publication TypeMiscellaneous
Year of Publication2010
AuthorsChristopher Thomas, Pankaj Mehra, Wenbo Wang, Amit Sheth, Gerhard Weikum
KeywordsDomain Model Creation and Information Extraction and Knowledge Extraction and Ontology Learning

This paper describes a minimally guided approach to automatic domain model creation. The first step is to carve an area of interest out of the Wikipedia hierarchy based on a simple query or other starting point. The second step is to connect the concepts in this domain hierarchy with named relationships. A starting point is provided by Linked Open Data, such as DBPedia. Based on these community-generated facts we train a pattern-based fact-extraction algorithm to augment a domain hierarchy with previously unknown relationship occurrences. Pattern vectors are learned that represent occurrences of relationships between concepts. The process described can be fully automated and the number of relationships that can be learned grows as the community adds more information. Unlike approaches that are aimed at finding single, highly indicative patterns, we use the cumulative score of many pattern occurrences to increase extraction recall. The relationship identification process itself is based on positive-only classification of training facts.