Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery

TitleSemantic Provenance: Modeling, Querying, and Application in Scientific Discovery
Publication TypeThesis
Year of Publication2010
AuthorsSatya S. Sahoo
Academic DepartmentDepartment of Computer Science and Engineering
Number of Pages130
Date Published07/2010
UniversityWright State University
Thesis TypePhD Dissertation
KeywordsBiomedical Informatics, Materialized Provenance View, Provenance context entity, Provenance Query Operators, Provenir ontology, RDF reification, semantic provenance, Semantic Web, SPARQL Query Optimization

Provenance metadata, describing the history or lineage of an entity, is essential for ensuring data quality, correctness of process execution, and computing trust values. Traditionally, provenance management issues have been dealt with in the context of workflow or relational database systems. However, existing provenance systems are inadequate to address the requirements of an emerging set of applications in the new eScience or Cyberinfrastructure paradigm and the Semantic Web. Provenance in these applications incorporates complex domain semantics on a large scale with a variety of uses, including accurate interpretation by software agents, trustworthy data integration, reproducibility, attribution for commercial or legal applications, and trust computation. In this dissertation, we introduce the notion of "semantic provenance" to address these requirements for eScience and Semantic Web applications. In addition, we describe a framework for management of semantic provenance by addressing the three issues of, (a) provenance representation, (b) query & analysis, and (c) scalable implementation. First, we introduce a foundational model of provenance called Provenir to serve as an upper-level reference ontology to facilitate provenance interoperability. Second, we define a classification scheme for provenance queries based on the query characteristics and use this scheme to define a set of specialized provenance query operators. Third, we describe the implementation of a highly scalable query engine to support the provenance query operators, which uses a new class of materialized views based on the Provenir ontology, called Materialized Provenance Views (MPV), for query optimization. We also define a novel provenance tracking approach called Provenance Context Entity (PaCE) for the Resource Description Framework (RDF) model used in Semantic Web applications. PaCE, defined in terms of the Provenir ontology, is an effective and scalable approach for RDF provenance tracking in comparison to the currently used RDF reification vocabulary. Finally, we describe the application of the semantic provenance framework in biomedical and oceanography research projects.

Full Text

Additional Resources

Satya S. Sahoo, Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery, Ph.D. Thesis, Wright State University, 2010.

Related Files: