Open Source


Kno.e.sis seeks to make a significant majority of the tools, data and ontologies developed under federal funding under an open source license for use by fellow researchers and often for broader use, especially when quality of software and tool are ready for external use. We also endeavor to support the evolving standards and specifications in a variety of related fields through W3C and other channels. Kno.e.sis has a history of developing community resources such as Semantic Web datasets, open source tools, and public services, which they have hosted for significant periods after the end of respective projects and/or made available through public or open source distribution channels. For more information, please email opensource at

Tools & Services * Ontologies & Data Sets * Standards

Tools & Services

Active Projects

  • Twitris: A Semantic Social Web platform, which facilitates understanding of social perceptions about real-world events by analyzing user-generated data on social media. Twitris addresses challenges in large scale processing of social data, and analyzes data along multiple dimensions including location, time, topic, user, network, sentiment, and emotion (latest version v4 available at
  • Crisis Computing API (NSF SOCS project): This API interface provides 'Classification as a Service' based on our research for seeking-supplying intent classifiers to assist coordination: donation related message, request to help, offer to help, etc. (Also integrated with Ushahidi's CrisisNET project)
  • Projects: METEOR-S [PI: Prof. Amit Sheth]
  • MobiCloud: MobiCloud is a Domain Specific Language (DSL) based platform agnostic application development paradigm for cloud-mobile hybrid applications. A cloud-mobile hybrid is simply an application that partially runs on the mobile device and in the cloud. MobiCloud makes it extremely easy to develop these applications and deploy them to clouds and mobile devices.
  • Twarql: Twarql investigate the representation of tweets as RDF in order to enable flexibility in handling the information overload of those interested in collectively analyzing social media for sensemaking. Twarql source can be accessed at and available under the BSD licence.
  • Cuebee: A flexible, extensible application for querying the semantic web. It provides a friendly interface to guide users through the process of formulating complex queries. Cuebee source can be accessed at and available under the Creative Commons Attribution-No Derivative Works 3.0 Unported License.
  • Kino (Also known as KinoE ) is a Web document annotation and indexing system that helps scientists annotate and index Web documents. Kino uses a browser plugin to add annotations and a Apache SOLR based backend to index and store the Web pages. Kino source can be accessed at and available under the Apache 2.0 license.
  • The Doozer model creation framework extracts entities and relationships from text with the goal of building comprehensive formal models of emerging or continuously changing domains. Upon completion, the code will be made available here. Example domain models created with the prototype can be found on the project page:
  • BLOOMS : An acronym for Bootstrapping-based Linked Open Data Ontology Matching System, BLOOMS is an ontology alignment system based on the idea of bootstrapping information already present on the LOD cloud. It was developed particularly for Linked Open Data schema alignment. Further details are available at BLOOMS Wiki page.
  • Scooner : Scooner is a prototype search application that integrating the Web of pages with the Linked Open Data. The following is a demo of Scooner. Or you can take a look at the wiki page.

Past Projects

  • SA-REST Annotator: SA-REST is a specification about adding annotations to Web pages. SA-REST annotator is a Firefox based browser plugin that allows users to add SA-REST annotations and publish them. SA-REST Annotator source can be accessed at and available under the Apache 2.0 license.
  • SAWSDL4J: A clean object model for handling SAWSDL. The source and the binaries for this project can be downloaded from The software is available for use under the Apache 2.0 license.
  • Test Ontology Generation Tool (TOntoGen): TOntoGen generate large, high-quality data sets for testing semantic web applications. It has been implemented as a Protege plugin. TOntoGen can be downloaded from the LSDIS TOntoGen project page.
  • Radiant: Radiant is an Eclipse based graphical UI for annotating existing WSDL documents into WSDL-S or SAWSDL via an OWL Ontology. Radiant can be downloaded via the LSDIS Radiant project page.
  • BRAHMS: A fast main-memory RDF/S storage, capable of storing, accessing, and querying large ontologies.
  • Semantic Visualization Tools: Semantic Visualization [SemViz] was a subproject within SemDis project and developed some of the earliest Semantic Web/RDF visualization tools: Semantic Analytics Visualization [SAV] - a 3D visualization tool for semantic analytics, SET - Semantic EventTracker, a highly interactive visualization tool for tracking and associating activities (events), and Paged Graph Visualization [PGV].
  • SemDis API: A simple yet flexible set of interfaces intended to be a basis for implementations of RDF data access suitable to the types of algorithms being developed in the SemDis project.
  • Semantic Browser: A tool that demonstrates the concept of Relationship Web by creating a relationships-centric metaweb on documents. It allows users to traverse semantically connected documents through domain-specific relationships and uses research in entity and relationship extraction.

Ontologies and Data sets

  • SoCS Ontology for Crisis Coordination (SOCC): We extend the concepts of domain knowledge­-driven models, MOAC- Management Of A Crisis ontology (Limbu 2012), and UNOCHA's HXL- Humanitarian Exchange Language (Keßler et al. 2013) ontology, with required but missing concepts for organizing data during crisis response coordination for seeker and supplier behavior, and indicators of resource needs using a lexicon. For example, the 'shelter' class contains words 'emergency center,' 'tent,' and 'shelter,' along with lexical alternatives. For the present demonstration, we focus on three resource categories: food, shelter and medical needs. Thus, we endeavor to exploit a minimum, but always expandable subset that provides the maximum coverage while controlling false alarms. For creating lexicons of indicator words for concepts, we relied on various documents collected via interactions with domain experts (Flach et al. 2013), our Community Emergency Response Team (CERT) training, Rural Domestic Preparedness Consortium training, and publically available references (Homeland Security 2010; FEMA 2012; OCHA,Verity 2011). Using a first aid handbook (Swienton and Subbarao 2012), we created an extensive 'medical' subset of emergency indicators, where we identified words which pertained specifically to first aid or injuries and included those words along with variations in tense (i.e., breath, breathing, breathes) and common abbreviations (i.e. mouth to mouth, mouth 2 mouth, CPR). A local expert with FEMA experience augmented the model with additional indicators and provided anecdotal context. The current model with food, medical, and shelter resource indicators contain 43 concepts and 45 relationships. We created this domain model in the OWL language using the Protégé ontology editor (Protégé 2013). Each type of disaster is listed as an entity type with indicators for that disaster listed as individuals under a corresponding indicator entity. Therefore a relationship is declared stating that a particular disaster concept, say Flood, relates by property 'has_a_positive_indicator', with 'Flood_i' indicator entity, that includes all words under that heading. Each disaster has a declared negative relationship with the negative indicator list (e.g., 'erotic' under sexual words indicators) under the entity name Negative_Indicator_i. Finally resources are declared as individuals under the appropriate entity in the same way, but relationships are not explicitly stated with any disaster in order to provide flexibility. [Read more: Purohit et al., JCSCW 2014]
  • Singleton Property Datasets: The singleton property approach can be used to represent statements about statements in RDF without the use of reification. The main idea of the approach is to create a property instance and enforce the uniqueness of the property instance in only one triple. This approach is compatible with RDF/RDFS and SPARQL.
  • Citypulse Dataset: This webpage offers a number of semantically annotated datasets collected from partners of the CityPulse EU FP7 project and relevant resources for smart city data (over 120GB data in 6 large datsets as of November 2014).
  • Sensor and Sensor Network (SSN) Ontology: The Sensor and Sensor Network ontology, known as the SSN ontology, answers the need for a domain-independent and end-to-end model for sensing applications by merging sensor-focused (e.g. SensorML), observation-focused (e.g. Observation & Measurement) and system-focused views. It covers the sub-domains which are sensor-specific such as the sensing principles and capabilities and can be used to define how a sensor will perform in a particular context to help characterize the quality of sensed data or to better task sensors in unpredictable environments. Although the ontology leaves the observed domain unspecified, domain semantics, units of measurement, time and time series, and location and mobility ontologies can be easily attached when instantiating the ontology for any particular sensors in a domain. The alignment between the SSN ontology and the DOLCE Ultra Lite upper ontology has helped to normalise the structure of the ontology to assist its use in conjunction with ontologies or linked data resources developed elsewhere. This ontology is publicly accessible via W3C.
  • Provenir : A reference ontology for modeling domain-specific provenance. Additional information and download with a Crieative Commons license is available at:
  • Proteomics data and process provenance (ProPreO): ProPreO is a large glycoproteomics provenance ontology. The ProPreO schema includes 480 classes and attendant relations where as the populated ontology includes 3.1 million instances. More information is available at the Kno.e.sis ProPreO page. This ontology is publicly accessible via NCBO bioportal.
  • Parasite Experiment Ontology (PEO): The ontology comprehensively models the processes, instruments, parameters, and sample details that will be used to annotate experimental results with provenance metadata (derivation history of results). More details are avaialable in the PEO wiki page and the ontology can be publicly accessible via NCBO bioportal.
  • Parasite Life Cycle Ontology (PLO): PLO models the life cycle stage details of T.cruzi and two related kinetoplastids, Trypanosoma brucei and Leishmania major. More information on PLO is avaialable from the PLO wiki page and the ontology is available through NCBO bioportal.
  • Linked Sensor Data and Linked Observation Data: Linked Sensor Data an RDF dataset containing expressive descriptions of ~20,000 weather stations in the United States. Linked Observation Data is a 1.7 billion triple RDF dataset containing expressive descriptions of hurricane and blizzard observations in the United States. All data is also included in Linked Open Data Cloud.
  • SWETO and SWETODBLP: Semantic Web Technology Evaluation Ontology and its follow on SWETODBLP were early populated ontologies created by extracting real world data using tools created by Taalee/Voquette/Semagix(a company founded by Prof. Amit Sheth) that were made available at no cost for research use. Latest SWETODBLP data with a Creative Commons License is available at
  • City Event Extraction Dataset: Using citizen sensor observations in the form of microblogs to extract city events provides city authorities direct access to the pulse of the populace. This dataset contains textual data (tweets) collected from San Francisco Bay Area for four months and a ground truth data for traffic related events collected from This dataset can be utilized for evaluing city event extraction techniques/algorithms as it has both textual event and the ground truth. This dataset is available with Creative Commons License on Open Science Framework.


Kno.e.sis and its researchers have had significant impact on standards and have shown strong leaderships in standards activities. Wright State University is an official member of the World Wide Web Consortium (W3C). Prof. Amit Sheth has served as a W3C advisor committee member since 2002. Prof. Sheth and his team defined WSDL-S for annotating semantic web services (see also), which was submitted to W3C in collaboration with IBM. SAWSDL, adopted as a recommendation (standard) in 2007 was directly based on WSDL-S, and Kno.e.sis members were active members of the W3C SAWSDL working group that defined SAWSDL. Prof. Sheth also co-chaired W3C Semantic Web Service Testbed Incubator Group (XG) [Kno.e.sis contributors: Karthik Gomadam, Meena Nagarajan, Ajith Ranabahu, Amit Sheth, Kunal Verma, with John Miller at UGA] Prof. Sheth proposed GLYDE which was subsequently developed by his team with Prof. William York UGA's Complex Carbohydrate Research Center. GLYDE-II, an XML standard for data exchange that has been accepted as the standard protocol by the leading carbohydrate databases in the United States, Germany, and Japan. [Kno.e.sis/LSDIS contributors: Cory Henson, Satya Sahoo, Amit Sheth, Christopher Thomas] Prof. Sheth was an active early member of W3C Semantic Web for Health Care and Life Sciences interest group (HCLSIG), and provided its earliest use case based on Active Semantic Electronic Medical Record, an operationally deployed semantic web application in clinical setting since January 2006. Dr. Satya Sahoo (Advisor: Prof. Sheth), while at Kno.e.sis was a key participant in W3C Provenance XG. He defined semantic provenance and developed Provenir ontology that influenced Provenance XG's related work, which where then adapted by Provenance Working Group. Satya (now at Case Western Reserve University) was a contributor to the W3C Provenance XG Final Report. Dr. Matthew Perry (Advisor: Prof. Sheth), who worked on SPAQL-ST and spatial, temporal, and thematic analytics over Semantic Web data at Kno.e.sis, continued his work on spatial extension of SPAQRL after joining Oracle. He was one of the two editors of Open Geospatial Consortium's OGC GeoSPARQL - A Geographic Query Language for RDF Data. Prof. Sheth co-founded and co-chaired W3C Semantic Sensor Networking (SSN) XG which developed now widely developed SSN Ontology. Cory Henson is a co-editor of the SSN XG Final Report and the primary author of semantic sensor data annotation aspects. [Participants: Cory Henson, Amit Sheth] Prof. Sheth and his team has developed SA-REST which is also a W3C SA-REST member submission. Many use cases and tools have been developed to support SA-REST based semantic web service annotation, semantic search/discovery of Web APIs and REST-ful services, etc. [Participants: Karthik Gomadam, Ajith Ranabahu, Amit Sheth] Kno.e.sis researchers have participated in or are participating in a number of W3C working/interest/community groups, including: Linked Data Platform, HCLSIG.