Spatio-Temporal-Thematic Query Processing
Project Description Publications Presentations Data Sets
Project Description
Analytical applications are increasingly exploiting complex relationships among named entities as a powerful analytical tool. Such `connect-the-dots' applications are common in many domains including national security, drug discovery, and medical informatics. Semantic Web Technologies are well suited for this type of analysis. It is often necessary that the analysis process spans across multiple heterogeneous data sources, and ontologies and semantic metadata standards help facilitate aggregation and integration of this content. In addition, standard models for metadata representation on the web, such as Resource Description Framework (RDF), model relationships as first class objects making it very natural to query and analyze entities based on their relationships. Researchers have consequently argued for graph-based querying of RDF, and fundamentally new analytical operators based on the graph structure of RDF have emerged (e.g., semantic associations and subgraph discovery). These operators allow querying for complex relationships among named entities where an ontology provides the context or domain semantics. We use the term semantic analytics to refer to this process of searching and analyzing semantically meaningful connections among named entities. Semantic analytics has been successfully used in a variety of settings, for example identifying conflict of interest, detecting patent infringement and discovering metabolic pathways.
So far, semantic analytics tools have primarily focused on thematic relationships, but spatial and temporal relationships are often critical components in analytical domains. In fact, most entities and events can be described along three dimensions: thematic, spatial and temporal. Consider the following event: Fred Smith moved into the house at 244 Elm Street on November 16, 2007. The thematic dimension describes what is occurring (the person Fred Smith moved to a new residence). The spatial dimension describes where the event occurs (the new residence is located at 244 Elm Street). The temporal dimension describes when the event occurs (the moving event occurred on November 16, 2007). Unfortunately, integrated semantic analytics over all three dimensions is not currently possible because of the following gaps in the state of the art:
- Current GIS and spatial database technology does not support complex thematic analytics operations. Traditional data models used for GIS excel at modeling and analyzing spatial and temporal relationships among geospatial entities but tend to model the thematic aspects of a given domain as directly attached attributes of geospatial entities. Thematic entities and their relationships are not explicitly and independently represented, making analysis of these relationships difficult.
- Current semantic analytics technology does not support analysis of spatial and temporal relationships. Semantic analytics research has focused on thematic relationships between entities. Thematic relationships can be explicitly stated in RDF graphs, but many important spatial and temporal relationships (e.g., distance and elapsed time) are implicit and require additional computation. Semantic analytics tools depend on explicit relations and must be extended if they are to use implicit spatial and temporal relations.
We are researching a framework that can bridge these gaps. We propose a very flexible approach for modeling spatial, temporal and thematic (STT) data using Semantic Web data models. In addition, we have developed and implemented two approaches for querying STT data in our model. The first approach is a SQL-based approach that uses user-defined functions for graph pattern based queries involving spatial and temporal components. The second approach defines a query language, SPARQL-ST, that is an extension of SPARQL for spatio-temporal-thematic queries. Both approaches have been prototyped by extending a commercial DBMS.
In addition, demand for systems that can efficiently manage large amounts of Semantic Web data has reached a critical point. This demand is driven to a major extent by the existence of many large, real-world Semantic Web datasets. Some examples of publicly-available datasets include GovTrack (data about activities of US Congress -- 13 million triples), SwetoDBLP (bibliography data focused on Computer Science publications -- 11 million triples), DBPedia (multi-domain data derived from Wikipedia content -- 218 million triples) and UniProt (data describing functional aspects of proteins -- over 1 billion triples). The development of a scalable system for managment of STT Semantic Web data is thus a major component of our research.
Modeling Approach
We model spatio-temporal-thematic data as follows. We incorporate temporal information using Temporal RDF Graphs. Temporal RDF extends the RDF statement from a triple to a quad where the fourth element is the valid time of the RDF statement. Temporal RDF triples are encoded using standard RDF reification (see the figure below). Spatial features are complex and must be properly defined with an ontology. We use an ontology based on the Open Geospatial Consortium (OGC) Geographic Modeling Language (GML) specification for this purpose (see the figure below).
SQL-based Querying Approach
We have developed a set of spatial and temporal query operators for searching and analyzing spatial and temporal relationships between named entities in temporal RDF graphs. These operators are an adequate functional set in that they (1) allow precise specification of a thematic portion of the RDF graph (subgraph), (2) provide facilities to compute spatial and temporal properties of these subgraphs and (3) allow filtering and joins based on the computed spatial and temporal properties. The operators are implemented as SQL table functions. Table functions produce a set of rows as output which can be queried. They are used in SQL queries in the same manner as a database table name. See the example below for illustration.
With this query, we are using the spatial_eval operator to specify (1) a relationship between a soldier, a chemical agent and a battle location and (2) a relationship between members of an enemy organization and their known locations. We are then limiting the results based on the spatial proximity of the battles and enemy sightings. In Addition, we provide a spatial_extent operator that allows retrieving the spatial geometry associated with the spatial entities composing a thematic relationship and optionally filtering the results using a spatial predicate. For example, find all soldiers participating in military events that take place within an input bounding box. For temporal aspects, we provide an analogous temporal_extent operator that returns the temporal properties of a given relationship and allows optional filtering. For example, return all soldiers exhibiting a given symptom during a specific time period. We also provide a temporal_eval operator that can answer queries such as find soldiers who exhibited symptoms after participating in a given military event.
SPARQL-ST
It is important that our STT querying approach fits with the Semantic Web community's existing querying framework. SPARQL is the current World Wide Web Consortium (W3C) recommended query language for RDF data. As a part of this project, we have developed SPARQL-ST: an extension of SPARQL that allows querying spatiotemporal RDF graphs (i.e. temporal RDF Graphs that contain spatial objects). Consider the SPARQL-ST query below.
SPARQL-ST introduces a spatial variable type (denoted with a % prefix) and a temporal variable type (denoted with a # prefix). Spatial variables represent complex spatial features rather than a single URI, and the concept of a mapping is extended so that spatial variables map to a set of triples that represent a spatial feature. The spatial variable %g is used in the query above to represent the spatial extent of a congressional district. Temporal variables map to time intervals rather than a URI and can appear in the quad position of what we term a spatiotemporal triple pattern. Temporal variables are used in the example query to retrieve the valid time of each temporal RDF statement. In addition, SPARQL-ST allows computation of derived time intervals. For example, the query above computes the interval intersection of four time intervals to derive the valid time of the entire triple pattern. SPARQL-ST also introduces SPATIAL FILTE} and TEMPORAL FILTER expressions to filter results using spatial and temporal conditions. The query above applies a filtering conditon to the spatial extent of each congressional district.
Conference and Workshop Papers:
- M. Perry, A. Sheth, F. Hakimpour, P. Jain "Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS '07), Mexico City, MX, November 29 - 30, 2007 (PDF)
- M. Perry, F. Hakimpour, A. Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA, November 10 - 11, 2006 (PDF)
- F. Hakimpour, B. Aleman-Meza, M. Perry, A. Sheth. "Data Processing in Space, Time, and Semantics Dimensions", Terra Cognita 2006 - Directions to the Geospatial Semantic Web, in conjunction with the Fifth International Semantic Web Conference (ISWC '06), Athens, GA, November 6, 2006 (PDF)
Journal Articles:
- A. Sheth and M. Perry, "Traveling the Semantic Web through Space, Time and Theme", IEEE Internet Computing, Vol. 12, No. 2, February/March 2008 (PDF)
- I. B. Arpinar, A. Sheth, C. Ramakrishnan, L. Usery, M. Azami, and M. Kwan, "Geospatial Ontology Development and Semantic Analytics", Transactions in GIS, Blackwell Publishing, Vol. 10, No. 4, 2006. (PDF)
Book Chapters:
- F. Hakimpour, B. Aleman-Meza, M. Perry, A. Sheth, "Spatiotemporal-Thematic Data processing in Semantic Web", The Geospatial Web, A. Scharl and K. Tochtermann (Eds.), Springer-Verlag, May, 2007 (PDF)
- M. Perry, A. Sheth, I. B. Arpinar. "Geospatial and Temporal Semantic Analytics", To appear in Handbook of Research on Geoinformatics, Hassan A. Karimi (Ed.), Idea-Group Inc., 2009 (PDF)
Ph.D. Dissertations and Master's Theses:
- M. Perry. "A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data", Ph.D. Dissertation, Wright State University, June 10, 2008 (PDF)
Technical Reports:
- M. Perry,A. Sheth. and P. Jain "SPARQLST:Extending SPARQL to Support Spatiotemporal Queries", Kno.e.sis Center Technical Report. KNOESIS-TR-2009-01, Nov 3, 2008 (PDF)
- M. Perry and A. Sheth. "A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data", Kno.e.sis Center Technical Report. KNOESIS-TR-2008-01, May 13, 2008 (PDF)
-
Title: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data
Given at: Wright State University, Dayton, OH, July 10, 2008
Download: PPT
-
Title: Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data
Given at: Second International Conference on Geospatial Semantics (GeoS '07), Mexico City, MX, November 30, 2007
Download: PPT
-
Title: Analyzing Theme, Space and Time: An Ontology-based Approach
Given at: Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA, November 11, 2006
Download: PPT
-
Description: Small real-world spatiotemporal RDF data set describing social and terrorism-related events.
Link: http://lsdis.cs.uga.edu/projects/semdis/spatiotemporal/
Related Publication: F. Hakimpour, B. Aleman-Meza, M. Perry, A. Sheth. "Data Processing in Space, Time, and Semantics Dimensions", Terra Cognita 2006 - Directions to the Geospatial Semantic Web, in conjunction with the Fifth International Semantic Web Conference (ISWC '06), Athens, GA, November 6, 2006
-
Description: Large synthetically generated RDF data set for historical battlefield analysis scenario (7 million asserted triples) with links to accompanying spatial data.
Link: http://knoesis.wright.edu/students/mperry/STData.html
Related Publication: M. Perry, A. Sheth, F. Hakimpour, P. Jain "Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS '07), Mexico City, MX, November 29 - 30, 2007 -
Description: Large synthetically generated RDF data set for historical battlefield analysis scenario (18 million asserted triples) with links to accompanying spatial data. Also, links are available for spatial data that can be incorporated into the real-world GovTrack RDF dataset.
Link: http://knoesis.wright.edu/students/mperry/dissertation/Test-Details.html
Related Publication: M. Perry. "A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data", Ph.D. Dissertation, Wright State University, June 10, 2008
This research was initially funded in part by NSF Award#IIS-0714441 (01/01/2007-12/31/2009) [formerly IIS--0325464 (09/01/2004- 12/31/2006)], titled "Collaborative Proposal: ITR-SemDIS: Discovering Complex Relationships in the Semantic Web." Additionally, this research is partially funded by NSF Award#IIS-0842129, titled "III-SGER: Spatio-Temporal-Thematic Queries of Semantic Web Data: a Study of Expressivity and Efficiency" (09/01/2008-08/31/2010).
Back