|Title||SPARQ2L: Towards Support For Subgraph Extraction Queries in RDF Databases|
|Publication Type||Conference Paper|
|Year of Publication||2007|
|Authors||Kemafor Anyanwu, Amit Sheth, Angela Maduko|
|Conference Name||16th International World Wide Web Conference (WWW 2007)|
|Conference Location||Banff, Alberta|
Many applications in analytical domains often have the need to 'connect the dots' i.e., query about the structure of data. In bioinformatics for example, it is typical to want to query about interactions between proteins. The aim of such queries is to 'extract' relationships between entities i.e. paths from a data graph. Often, such queries will specify certain constraints that qualifying results must satisfy e.g. paths involving a set of mandatory nodes. Unfortunately, most present day Semantic Web query languages including the current draft of the anticipated recommendation SPARQL, lack the ability to express queries about arbitrary path structures in data. In addition, many systems that support some limited form of path queries rely on main memory graph algorithms limiting their applicability to very large scale graphs. In this paper, we present an approach for supporting Path Extraction queries. Our proposal comprises (i) a query language SPARQ2L which extends SPARQL with path variables and path variable constraint expressions, and (ii) a novel query evaluation framework based on efficient algebraic techniques for solving path problems which allows for path queries to be efficiently evaluated on disk resident RDF graphs. The effectiveness of our proposal is demonstrated by a performance evaluation of our approach on both real world and synthetic datasets.
|Full Text|| |