Advanced Visual Data Analysis
Tensor Field Visualization
The analysis and visualization of tensor fields is an advancing area in scientific visualization. Topology based methods that investigate the eigenvector fields of second order tensor fields have gained increasing interest in recent years. To complete the topological analysis, we developed an algorithm for detecting closed hyper-streamlines as an important topological feature. Visit the project page to learn more.
Large-scale Visualization of Arterial Trees
Current CT scanner allow the retrieval of vessel only up to a certain point due to the limited resolution. Recent techniques developed by Benjamin Kaimovitz et al. allow the extension of such scans down to the vessels at the capillary level, resulting in a model of the entire arterial vasculature. Of course, such a model is enormous in size challenging the visualization. We implemented a visualization software that is capable of handling a model with several GBs in size, exceeding the main memory of desktop computers. The software is highly optimized for tree shaped geometrical objects to achieve the best rendering performance possible. Visit the project page to learn more.
FAnToM: Vector Field Visualization
FAnToM (Field Analysis using Topological Methods) is a software system that allows a user to explore vector fields by applying different analysis and visualization algorithms. Among other algorithms, it is capable of analyzing the topology of a 2-D or 3-D vector field, including complex structures, such as closed streamlines. This greatly helps a user to comprehend the structure of complex vector fields which could not be achieved by traditional visualization methods. Visit the project page to learn more.
Early Lung Disease Detection Alliance
The Cleveland Clinic Foundation and its partners, Riverain Medical, Wright State University and University Hospitals Health System, have joined together to form the Early Lung Disease Detection Alliance (ELDDA), a multidisciplinary research and commercialization program that will develop, test (through clinical trials), and bring to market new image-analysis systems that permit the early detection of lung cancer and other lung diseases. This computer-aided detection (CAD) system will be applied to the most widely available and used imaging exam, the chest x-ray. The fight against lung cancer is waged on three major fronts: prevention, detection and treatment. The goal of this collaboration is to detect disease at an early stage (i.e. stage I for lung cancer), a necessary step to improve the treatment and survival of lung cancer patients and those at risk for lung cancer throughout Ohio. Visit the project page to learn more.
Diffuse Coronary Artery Disease Detection
The general objective of this project is to develop a novel rationale for diagnosis of diffuse coronary artery disease (DCAD) using clinical non-invasive imaging of the coronary arteries. The indices of diagnosis will be validated in studies of an atherosclerotic porcine model with DCAD. Our unique algorithms for accurately extracting morphometric data from computerized tomography angiography (CTA) images of normal and disease patients along with our quantitative approach uniquely position us to undertake this research. Visit the project page to learn more.
3D Computer Games
Computer games are in a sense an example of virtual environments. In order to facilitate a fully immersive experience, we developed computer games that support quad-buffered stereo. Combined with, for example, 3D-capable displays and active shutter glasses, these games provide a truly 3D experience. Similarly, existing games and game engines can be ported to support such 3D capabilities, such as Cube 2. With Cube 2 being open source, we adapted its game engine to support 3D stereo. The adapted version can be downloaded, which includes Windows and Linux binaries, as well as the source code. Visit the project page to learn more.
Visualization of Vascular Structures
Cardiovascular diseases, such as atherosclerosis and coronary artery disease, are high risk factors for cardiac pain and death. We implemented a visualization software that enables interactive 3-D visualization of the cardiac vasculature retrieved using CT scanning technology, and an interactive flight through the vessel. Bifurcation angles and radii of the vessels can be measured while exploring the tree. Areas of high risk that could cause potential problems can be identified by this method. The project is conducted in collaboration with Dr. Ghassan Kassab's lab at the Department of Biomedical Engineering at the Indiana University Purdue University, who provided the data set. Visit the project page to learn more.
Bioinformatics, Healthcare & Life Sciences
SemPhyl: Using Semantic Technology in Phylogeny and Phyloinformatics
The specific objectives of this research are to develop and deploy a novel ontology-driven semantic problem solving in phylogeny analysis. To annotate context in phylogeny and make a foundation to allow the dynamic integration of local and public data to answer phylogenetic questions at multiple levels of granularity.
Semantics and Services-enabled Problem Solving Environment for Trypanosoma Cruzi
The study of complex biological systems increasingly depends on vast amounts of dynamic information from diverse sources. The scientific analysis of the parasite Trypanosoma cruzi (T.cruzi), the principal causative agent of human Chagas disease, is the driving biological application of this proposal. Approximately 18 million people, predominantly in Latin America, are infected with the T.cruzi parasite. As many as 40 percent of these are predicted eventually to suffer from Chagas disease, which is the leading cause of heart disease and sudden death in middle-aged adults in the region. Research on T. cruzi is therefore an important human disease related effort. It has reached a critical juncture with the quantities of experimental data being generated by labs around the world, due in large part to the publication of the T.cruzi genome in 2005. Although this research has the potential to improve human health significantly, the data being generated exist in independent heterogeneous databases with poor integration and accessibility. The scientific objectives of this research proposal are to develop and deploy a novel ontology-driven semantic problem-solving environment (PSE) for T.cruzi. This is in collaboration with the National Center for Biomedical Ontologies (NCBO) and will leverage its resources to achieve the objectives of this proposal as well as effectively to disseminate results to the broader life science community, including researchers in human pathogens. The PSE allows the dynamic integration of local and public data to answer biological questions at multiple levels of granularity. The PSE will utilize state-of-the-art semantic technologies for effective querying of multiple databases and, just as important, feature an intuitive and comprehensive set of interfaces for usability and easy adoption by biologists. Included in the multimodal datasets will be the genomic data and the associated bioinformatics predictions, functional information from metabolic pathways, experimental data from mass spectrometry and microarray experiments, and textual information from Pubmed. Researchers will be able to use and contribute to a rigorously curated T.cruzi knowledge base that will make it reusable and extensible. The resources developed as part of this proposal will be also useful to researchers in T.cruzi related kinetoplastids, Trypanosoma brucei and Leishmania major (among other pathogenic organisms), which use similar research protocols and face similar informatics challenges. Visit the project page to learn more.
Understanding protein structure and the forces that drive protein folding is one of the most fundamental and challenging problems in biochemistry. We are pursuing a number of projects that explore the determinants of protein structure and improve computational structure prediction methods. Our current areas of investigation include: Development of a novel technique for the identification of remote homologs, Characterization of secondary structure variability for protein sequences, and Hybrid experimental/computational methods for high-confidence prediction of protein tertiary and quaternary structure. The latter project involves improving the reliability of protein structure prediction algorithms by including experimental information in the model selection process. In collaboration with Dr. Jerry Alter's lab, (Department of Biochemistry and Molecular Biology, Wright State University) we have developed the computational support for MRAN - Modification Reactivity Analysis (see figure above). Based upon the reaction rate of proteolysis or residue modification reactions, solvent accessibility and other physicochemical properties of specific residues can be estimated. This information can then be used to drive the process of selecting and refining conformational models for further exploration. Visit the project page to learn more.
In revealing historical relationships among genes and species, phylogenies provide a unifying context across the life sciences for investigating diversification of biological form and function. The utility of phylogenies for addressing a wide variety of biological questions is evident in the rapidly increasing number of published gene and species trees. Further, this trend is certain to pick up pace with the explosion of data being generated with next generation sequencing technologies. The impact that this deluge of species and gene tree estimates will have on our understanding of the forces that shape biodiversity will be limited by the accessibility of these trees, and the underlying data and methods of analysis. The true structure of species trees and gene trees is rarely known. Rather, estimates are obtained through the application of increasingly sophisticated phylogenetic inference methods to increasingly large and complicated datasets. The need for Minimum Information about Phylogenetic Analyses (MIAPA) reporting standard is clear, but specification of the standard has been hampered by the absence of controlled vocabularies to describe phylogenetic methodologies and workflows.
Identification of Biomarkers of Toxicity and Downstream Outcomes
Metabolomics is the exhaustive characterization of metabolite concentrations in biofluids and tissues. The use of NMR and chromatography-linked mass spectrometry to assay metabolic profiles of tissue homogenates and biofluids has been increasingly recognized as a powerful tool for biological discovery. In recent years metabolomics techniques have been applied to a wide variety of diagnostic, preclinical, systems biology, and ecological studies. Working with Dr. Nick Reo's NMR spectroscopy lab at Wright State University, we are developing standards-based tools and web services for the pre-processing, normalization/standardization, exploratory and comparative analysis, and visualization of NMR spectra from biofluids. Visit the project page to learn more.
Forensic DNA Research
PCR-based amplification of STR loci has become the method of choice for the purpose of human identification in forensic investigations. With these loci, length polymorphisms associated with differences in the number of tandem repeats of four-nucleotide (tetranucleotide) core sequences are detected after polymerase chain reaction (PCR) amplification. A set of thirteen STR loci are typically genotyped with commercially available kits and length polymorphisms are identified with machines such as the Applied Biosystems 310 or 3100 capillary electrophoresis systems. In the analysis and interpretation of DNA evidence using STRs, a surprising number of technical, statistical, and computational issues emerge. Together with Forensic Bioinformatics Services, Inc., we investigate algorithmic, empirical, and statistical approaches to address many of these problems. The end goal of our research is to ensure that DNA evidence is treated with due scientific objectivity in the courtroom. Visit the project page to learn more.
Characterization and Analysis of Codon Usage Bias
Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. We have developed novel techniques for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape. Visit the project page to learn more.
CloudVista: Interactive Visual Analysis of Large Data in the Cloud
The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovative use of data cloud would provide unique opportunity to address this challenge. In this project, we propose the CloudVista framework to address (1) the problems caused by using sampling/summarization in the existing approaches and (2) the problems with the latency caused by cloud-side processing. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user's intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud. Visit the project page to learn more.
Cresp: Towards Optimal Cloud Resource Provisioning for Large Scale Data Intensive Parallel Processing Programs
Hadoop/MapReduce has been a top choice for big data analysis in the cloud. While the elasticity and economics of cloud computing are attractive, there is no effective tool for scientists to deploy MapReduce programs with their requirements on time and budget satisfied, or with energy consumption minimized. We propose an analysis framework that aims to efficiently learn the closed-form cost model for any specific MapReduce program. This framework includes a robust regression method learning closed-form cost models from small-scale settings, the component-wise cost-variance analysis and reduction, and a fast approximate model learning method based on the model library. Visit the project page to learn more.
Geometric Data perturbation for Privacy-preserving Data classification
This project investigates a random-geometric-transformation based data-perturbation approach for privacy preserving data classification. The goal of this perturbation approach is two-fold: preserving the utility of data in terms of classification modeling, and preserving the privacy of data. To achieve the first goal, we identify that many classification models utilize the geometric properties of datasets, which can be preserved by geometric transformation. We prove that the three types of well-known classifiers will deliver the same (or very similar) performance over the geometrically perturbed dataset as over the original dataset. As a result, this perturbation approach guarantees almost no loss of accuracy for three popular classification methods. To reach the second goal, we propose a multi-column privacy model to address the problems of evaluating privacy quality for multidimensional perturbation, and develop an attack-resilient perturbation optimization method. We analyze three types of inference attacks: naive estimation, ICA-based reconstruction, and distribution-based attacks with the proposed privacy metric. Based on the attack analysis, a randomized optimization method is developed to optimize perturbation. Our initial experiments show that this approach can provide high privacy guarantee while preserving the accuracy for the discussed classifiers. Visit the project page to learn more.
Mining Privacy Settings to Find Optimal Privacy-Utility Tradeoffs for Social Network Services
Privacy has been a big concern for users of social network services (SNS). On recent criticism about privacy protection, most SNS now provide fine privacy controls, allowing users to set visibility levels for almost every profile item. However, this also creates a number of difficulties for users. First, SNS providers often set most items by default to the highest visibility to improve the utility of social network, which may conflict with users' intention. It is often formidable for a user to fine-tune tens of privacy settings towards the user desired settings. Second, tuning privacy settings involves an intricate tradeoff between privacy and utility. When you turn off the visibility of one item to protect your privacy, the social utility of that item is turned off as well. It is challenging for users to make a tradeoff between privacy and utility for each privacy setting. We propose a framework for users to conveniently tune the privacy settings towards the user desired privacy level and social utilities. It mines the privacy settings of a large number of users in a SNS, e.g., Facebook, to generate latent trait models for the level of privacy concern and the level of utility preference. A tradeoff algorithm is developed for helping users find the optimal privacy settings for a specified level of privacy concern and a personalized utility preference. We crawl a large number of Facebook accounts and derive the privacy settings with a novel method. These privacy setting data are used to validate and showcase the proposed approach. Visit the project page to learn more.
RASP: Random Space Encryption for Efficient Multidimensional Range Query on Encrypted Databases
With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be so sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We summarize these key features for hosting a query service in the cloud as the CPEL criteria: data Confidentiality, query Privacy, Efficient query processing, and Low in-house processing cost. Bearing the CPEL criteria in mind, we propose the RASP data perturbation method to provide secured range query and kNN query services for the data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, which provides strong resilience to attacks on the perturbed data. The RASP perturbation preserves the multidimensional ranges for query, which allows existing indexing techniques such as RTree to be applied in query processing. Range query processing is conducted in two stages: query on the bounding box of the transformed range and filter out irrelevant results with secured conditions. Both stages can be done in the cloud with exact results returned to the client, which guarantees the EL criteria of CPEL. The kNN-R algorithm is designed to work with the RASP range query algorithm to efficiently process the kNN queries. We also carefully analyzed the attacks on data and queries under the precisely defined threat model. Extensive experiments are conducted to show the advantages of this approach on the CPEL criteria. Visit the project page to learn more.
EAGER: Knowledge Transfer Oriented Data Mining with Focus on the Decision Trees Knowledge Type
This project is to study knowledge transfer oriented data mining (or KTDM). Given two data sets, the idea of KTDM is to discover models that are common to both data sets, as well as models that are unique in one data set. These common and unique models with respect to the two data sets will provide a tool to leverage the already-understood properties of one data set for the purpose of understanding the other, probably less understood, data set. This EAGER project is to concentrate on models in the form of a diversified set of classification trees. The KTDM approach is useful for real-world applications in part due to its ability to allow users to narrow down to particular models, guided by known knowledge from another data set. It will help towards realizing transfer of knowledge and learning in various domains. The project will support a graduate student and will seek collaboration with experts in the medical domain. These will increase the impact of the project. This supplementary paper contains supplementary information about shared decision trees mined from various pairs of datasets, including 3 microarray gene expression datasets for cancer and 3 microarray gene expression datasets for cancer treatment outcome. Visit the project page to learn more.
Knowledge Extraction & Exploration
HPCO: Human Performance and Cognition Ontology
The project involves extending our work in focused knowledge (entity-relationship) extraction from scientific literature, automatic taxonomy extraction from selected community authored content (eg Wikipedia), and semi-automatic ontology development with limited expert guidance. These are combined to create a framework that will allow domain experts and computer scientists to semi-automatically create knowledge bases through an iterative process. The final goal is to provide superior (both in quality and speed) search and retrieval over scientific literature for life scientists that will enable them to elicit valuable information in the area of human performance and cognition.
Knowledge Representation & Reasoning
TROn: Tractable Reasoning with Ontologies
The Semantic Web is based on describing the meaning - or semantics - of data on the Web by means of metadata - data describing other data - in the form of ontologies. The World Wide Web Consortium (W3C) has made several recommended standards for ontology languages which differ in expressivity and ease of use. Central to these languages is that they come with a formal semantics, expressed in model-theoretic terms, which enables access to implicit knowledge by automated reasoning. Progress in the adoption of reasoning for ontology languages in practice is currently being made, but several obstacles remain to be overcome for wide adoption on the Web. Two of the central technical issues are scalability of reasoning algorithms, and dealing with inconsistency of the ontological knowledge bases. These two issues are being addressed in this project. The scalability issue has its origin in the fact that the expression of complex knowledge requires sophisticated ontology languages, like the Web Ontology Language OWL, which are inherently difficult to reason with - as witnessed by high computational complexities, usually ExpTime or beyond. This project builds on recent new developments in polynomial time languages around OWL in order to remedy this. In particular, in this project efficient algorithmizations and tools are developed for the largest currently known polynomial-time ontology language, called SROELVn. Reasoning with knowledge bases with expressivity beyond SROELVn is enabled through approximating these knowledge bases within SROELVn. The inconsistency issue has its origin in the fact that large knowledge bases, in particular on the web, are usually not centrally engineered, but arise out of the merging of different knowledge bases with different underlying perspectives and rationales. In this project tools are developed for efficient, i.e., polynomial-time reasoning with inconsistent ontologies. The concrete outcome of the project is an open source reasoning system which is able to reason efficiently with (possibly) inconsistent knowledge bases around OWL, in at least an approximate manner. Visit the project page to learn more.
A Decision Support Reasoner
DiCoy: Distributed Computing for the Web Ontology Language
ERRO: Efficient Reasoning with Rules and Ontologies
Kno.e.CoM: Knowledge-enabled Content Management
SEM: Semantics-enabled Editorial Management
Linked Open Data
ESQUILO - Expressive Scalable Querying over Integrated Linked Open Data
ESQUILO develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment. Visit the project page to learn more.
Machine Learning & Natural Language Processing
Large-scale distributed syntactic, semantic and lexical language models
We aim to build large scale distributed syntactic, semantic, and lexical language models that are trained by corpora with up to web-scale data on a supercomputer to substantially improve the performance of machine translation and speech recognition systems. It is conducted under the directed Markov random field paradigm to integrate both topics and syntax to form complex distributions for natural language. It uses hierarchical Pitman-Yor processes to model long tail properties of natural language. By exploiting the particular structure, the seemingly complex statistical estimation and inference algorithms are decomposed and performed in a distributed environment. Moreover, a long standing open problem, smoothing fractional counts due to latent variables in Kneser-Ney's sense in a principled manner, might be solved. We demonstrate how to put the complex language models into one-pass decoders of machine translation systems, and lattice rescoring decoder in a speech recognition system. Visit the project page to learn more.
Direct loss minimization for classification and ranking problems
Visit the project page to learn more.
Semi-supervised structured prediction
Visit the project page to learn more.
Federated Semantic Services Platform for Open Materials Science and Engineering
Materials data and information are essential for the design of any tangible product.There are a large number of materials handbooks and databases supporting various activities involved in materials development. Millions of publications contain information utilized by scientists, engineers, designers, and other consumers of materials data and information. This “Big Data” has created both challenges and opportunities. The White House’s Materials Genome Initiate (MGI) seeks to substantially improve the process of new material discovery and development, and shorten the time to deployment. Two of the core components of this initiative - new and sophisticated computer modeling technologies and next-generation experimental tools - received initial federal research support through 2012. The third major component is that of developing solutions for broader access to scientific data about materials to aid in achieving the goal of faster development of new materials at lower costs. Our approach recognizes the need for providing easy access to large amounts of highly distributed and heterogeneous data – including unstructured (scientific literature or publications), semi-structured and structured data. We recognize the need to support a variety of data as well as resources that provide data using APIs and Web services. We recognize the need for tools to be able to easily exchange data. We also recognize the need for integrated provenance (i.e., data lineage) to support data quality and relevance, and access control for organizations to share information when desired and yet keep valuable intellectual property confidential.
Material Database Knowledge Discovery and Data Mining (KDDM)
The Air Force Research Laboratory's Materials and Manufacturing Directorate (AFRL/RX) develops materials, processes, and manufacturing and sustainment technologies across the spectrum of aircraft, spacecraft and missile applications. However, there are few attempts that try to understand the full ramifications of using informatics in a more concerted manner for data management in the field of Material science. Knoesis Center with the collaboration with AFRL/RX applying knowledge and technology in informatics to the material domains, thus introducing the materials and process community to better data management practices. A data exchange system that will allow researchers to index, search, and compare data will enable a shortened transition cycle in material science. Visit the project page to learn more.
Semantic Sensor Web
CityPulse provides innovative smart city applications by adopting an integrated approach to the Internet of Things and the Internet of People. The project will facilitate the creation and provision of reliable real-time smart city applications by bringing together the two disciplines of knowledge-based computing and reliability testing. Visit the project page to learn more.
Over 300 million people are affected by asthma worldwide with 250,000 annual deaths attributed to the disease. In collaboration with an asthma pediatrician, Kno.e.sis researchers are developing a kHealth kit involving mobile computing and multitude of sensors with a knowledge-empowered probabilistic reasoning algorithms for asthma risk assessment and prediction. Multimodal health signals spanning personal, population, and public health signals are analyzed to understand asthma exacerbations leading to actionable information for asthma management. Evaluation with pediatric asthma patients at Dayton Children’s Hospital are underway. kHealth technology is also being evaluated with clinical partners on issues of reducing rehospitalization of Chronic Heart Patients, GI surgery patients, and on Behavioral Event Prediction in People with Dementia.
Semantic Sensor Web
Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these networks, however, often leaves this avalanche of data stovepiped and intensifies the existing problem of too much data and not enough knowledge. With a view to alleviating this glut, we propose that sensor data be annotated with semantic metadata to provide contextual information essential for situational awareness. This research was supported by The Dayton Area Graduate Studies Institute (DAGSI), AFRL/DAGSI Research Topic SN08-8: Architectures for Secure Semantic Sensor Networks for Multi-Layered Sensing. Visit the project page to learn more.
Semantic Services Research
The objective of Cirrocumulus is to develop a methodology for cloud application development and management at an abstract level by incorporating semantic enrichments at each phase of the applications lifecycle. This is intended to be achieved by using domain specific languages (DSL) for developing and configuring applications and introducing a middleware layer as a facade for core cloud services. Visit the project page to learn more.
The objective of the MobiCloud project is to provide a singular approach to address the challenges of the heterogeneity of the multitude of existing clouds as well the multitude of mobile applications. The MobiCloud project is based on a Domain Specific Language (DSL) based platform agnostic application development paradigm for cloud-mobile hybrid applications. Visit the project page to learn more.
SA-REST is a format to add additional metadata to (but not limited to) REST API descriptions in HTML or XHTML. Meta-data from various models such an ontology, taxonomy or a tag cloud can be embedded into the documents. This embedded metadata permits various enhancements, such as improve search, facilitate data mediation and easier integration of services. Visit the project page to learn more.