Project Descriptions

Take a look at exciting multidisciplinary projects and our project funding overview. To read more about individual projects, scroll past the image below.

To see a more detailed funding overview of the Kno.e.sis' ongoing and completed projects, please click the image below.

Active Projects

Assessing the Reliability of Medical Information on Online Social Media

[More]

Despite the fact that the factual reliability of information about medical problems, symptoms, and treatments is debatable, more and more people are turning to social media as a source of support and advice for managing their illness. To address this increasingly pressing problem, this research will explore methods to assess the reliability of medical information shared on social media. It will innovatively focus on the structure of a user’s relationships with other social media participants to predict the quality of the information shared. This collaborative effort between new faculty members in the School of Business and School of Engineering and Computer Science brings experts in Big Data analytics to carry out the work, establishing a cross-disciplinary research program.

Choose Ohio First: Growing the STEMM Pipeline in the Dayton Region FY2016/FY2017

[More]

Choose Ohio First funds higher education and business collaborations that will have the most impact on Ohio’s position in world markets such as aerospace, medicine, computer technology and alternative energy. These collaborations will ultimately produce substantive improvements to the pipeline of STEM graduates and STEM educators in Ohio. Choose Ohio First is a part of a strategic effort to bolster Ohio’s economic strength by ensuring a ready workforce for STEM-related industries.

CONTEXT-AWARE HARASSMENT DETECTION ON SOCIAL MEDIA

[Overview, Details]

The aim of this project is to develop comprehensive and reliable context-aware techniques (using machine learning, text mining, natural language processing, and social network analysis) to glean information about the people involved and their interconnected network of relationships, and to determine and evaluate potential harassment and harassers. An interdisciplinary team of computer scientists, social scientists, urban and public affairs professionals, educators, and the participation of college and high schools students in the research will ensure wide impact of scientific research on the support for safe social interactions.

CUTE: Instructional Laboratories for Cloud Computing Education

[More]

The CUTE labs are designed to use publicly available free cloud resources and open source software with no special requirement on computing infrastructures, so that they can be easily adopted and adapted at low cost. Four types of laboratories will be developed: the platform exploration labs, the data intensive scalable computing labs, the cloud economics labs, and the security and privacy labs. These labs cover the major principles of cloud computing and provide opportunities for students to develop essential skills for cloud computing practice. The labs and the CUTE environment will be tested and evaluated in different educational settings. The PIs have extensive educational and research experience in cloud computing, distributed computing, data management, mobile computing, operating systems, and security and privacy.

Development of NMR-based metabolomics in toxicity and disease

[More]


Discovering the effects of web root robot traffic web servers and clusters

[More]


EDRUG TRENDS

[More]

The ultimate goal of this proposal is to decrease the burden of psychoactive substance use in the United States. Building on a longstanding multidisciplinary collaboration between researchers at the Center for Interventions, Treatment, and Addictions Research (CITAR) and the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University, we propose to develop and deploy an innovative software platform, eDrugTrends, capable of semi-automated processing of social media data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S.

Employee and Job Search Semantic Engine: Phase I

[More]

This project will conduct research and initial prototyping of an Employee and Job Search Semantic Engine, subject to the available resources.

The Fels Longitudinal Study and Related Projects

[More]

The Fels Longitudinal Study is the world's largest and longest running study of human development, growth, body composition and aging. The LHRC draws on the strength of the Fels Longitudinal Study and other population-based studies past and present. The Fels Study was originally designed to study child growth and development. Physical growth, maturation and the psychological development of children were early key research areas of interest in the Fels Longitudinal Study. Today, the Fels Longitudinal Study focuses on physical growth, skeletal maturation, body composition, risk factors for cardiovascular disease and obesity, skeletal and dental biology, longitudinal biostatistical analyses and aging.

GENDER-BASED VIOLENCE

[More]

Social media provides a faster, cheaper and face-valid means to engage the public, providing unprecedented large-scale access to public views and behavior. It provides an ability to monitor attitudes in near real-time, to support timely mitigation efforts. While use of social media give advantages with regards to speed (velocity), in some case participation, broad sourcing and lower cost, studies cannot be tightly controlled with specific statistical sampling, availability of demographic data may be limited or language use can skew coverage. Ultimately all three resources (formal reports, surveys, and social media) require integration in order to assist policy design, prioritize attention for interventions, and design region-specific programs to curb GBV. A logical first step is to understand what social media offer for GBV monitoring and the design of mitigation and policy. In this project we assess the role of social media (data from Twitter) to identify public views related to GBV, and tweeting practices by geography, time, gender, and events to inform concerned parties and assist GBV policy design.

Hazard SEES

[Overview, Details]

In this project the team will design novel, multi-dimensional cross-modal aggregation and inference methods to compensate for the uneven coverage of sensing modalities across an affected region. By assimilating data from social and physical sensors and their integrated modeling and analysis, methodology to predict and help prioritize the temporally and conceptually extended consequences of damage to people, civil infrastructure (transportation, power, waterways) and their components (e.g. bridges, traffic signals) will be designed. The team will also develop innovative technology to support the identification of new background knowledge and structured data to improve object extraction, location identification, correlation or integration of relevant data across multiple sources and modalities (social, physical and Web). Novel coupling of socio-linguistic and network analysis will be used to identify important persons and objects, statistical and factual knowledge about traffic and transportation networks, and their impact on hazard models (e.g. storm surge) and flood mapping. Domain-grounded mechanisms will be developed to address pervasive trustworthiness and reliability concerns. Exemplar outcomes are expected to include specific tools for first-responders as well as recovery teams to aid in the prioritization of relief and repair efforts, leveraging improved flood response, urban mapping, and dynamic storm surge models, and interdisciplinary training of students leveraging research in pedagogy, in conjunction with Ohio State University’s new undergraduate major in data analytics, and Wright State University’s Big and Smart Data graduate certificate program.

III: Travel Fellowships for Students from U.S. Universities to Attend ISWC 2016

[More]

This National Science Foundation award funds Student Travel Fellowships for US students attending the 15th International Semantic Web Conference (ISWC 2016). The conference, which will be held in Kobe, Japan from October 17 to 21, is the premier major international forum for state-of-the-art research on all aspects of the Semantic Web and data on the Web—the next generation World Wide Web. This allows students to meet key members the Semantic Web research community, it gives them the opportunity to disseminate their work, and it provides a venue for them to interact with future national and international scientific collaborators.

kHealth

[Overview]

Over 300 million people are affected by asthma worldwide with 250,000 annual deaths attributed to asthma. In collaboration with an asthma pediatrician, Kno.e.sis researchers are developing a kHealth kit involving mobile computing and multitude of sensors with a knowledge-empowered probabilistic reasoning algorithms for asthma risk assessment and prediction. Multimodal health signals spanning personal, population, and public health signals are analyzed to understand asthma exacerbations leading to actionable information for asthma management. Evaluation with pediatric asthma patients at Dayton Children’s Hospital are underway. kHealth technology is also being evaluated with clinical partners on issues of reducing rehospitalization of Chronic Heart Patients, GI surgery patients, and on Behavioral Event Prediction in People with Dementia. kHealth research encompasses multiple funded projects, the most recent one being "SCH: kHealth: Semantic Multisensory Mobile Approach to Personalized Asthma Care", which was funded by the National Institutes of Health in 2016.

kHealth: Semantic Multisensory Mobile Approach to Personalized Asthma Care

[Overview]

This interdisciplinary project of computer scientists and a practicing pediatric asthma clinician addresses the central problem of deriving actionable information (smart data) from data. It uses ambient and wearable sensors, mobile computing and semantic computing technologies. This specifically builds on the concept of personalized digital medicine that supports an individualized evidence-based approach combined with a knowledge based reasoning to help doctors determine more precisely the cause, severity, and control of asthma, plus alert patients and caregivers to seek timely clinical assistance to better manage asthma and improve their quality of life. The project will involve real-world evaluations with over 200 children with asthma to measure the effectiveness of the approach.

Market-driven Innovations and Scaling Up of Twitris

[Overview]

Twitris is a comprehensive analytical tool which can provide professional users with actionable information for making better decisions from social media data. The proposed effort, along with potential customers, seeks to develop and incorporate new innovation that take Twitris on a path towards commercialization. Specific innovations planned include: functional enhancements such as broad range of location-specific processing, intuitive user-guided and background knowledge supported analysis and visualization, and cloud computing based scaling to meet real-time processing needs of large-scale, real-world events.

Maximizing the Collective Intelligence of a Network Using Novel Measures of Socio-Cognitive Diversity

[More]

We will explore the degree to which it is possible to augment “wisdom of crowd” effects by developing novel, theory-based measures of socio-cognitive diversity and using them to select smaller, smarter sub-crowds. Socio-cognitive diversity refers to differences in individuals’ prior beliefs and information sources, including information acquired through social interaction. We will consider the case of networked crowds in particular–that is, groups of communicating individuals who share information in the process of arriving at a judgment (for example, a group of military intelligence analysts who work together to predict the location of a high-value target). We focus on networks because (a) real-world analysis and decision-making typically involve some degree of collaboration; (b) communications among members (i.e., who said what to whom) constitute a rich data source from which measures of diversity can potentially be extracted using automated methods.

Medical Information Decision Assistance and Support (MIDAS)

[More]

In this SBIR project, Milcord and Kno.e.sis propose to research, design, and develop a Medical Information Decision Assistance and Support knowledge base, with a mobile application front end for medical practitioners to both communicate treatment plans with and receive status updates from their patients. The goal is to seed the knowledge base with medical and patient care concepts using existing ontologies, and to populate instances of treatment plans, disease symptoms, and other information required for assisting practitioners with understanding the efficacy of treatment plans.

Modeling Social Behavior for Healthcare Utilization in Depression

[More]

Depression is one of the most common mental disorders in the U.S. and is the leading cause of disability affecting millions of Americans every year. Successful early identification and treatment of depression can lead to many other positive health and behavioral outcomes across the lifespan. This proposal will apply "big data" techniques and methods for identifying combinations of online socio-behavioral factors and neighborhood environmental conditions that can enable detection of depressive behavior in communities and studying access and utilization of healthcare services.

NIDA NATIONAL EARLY WARNING SYSTEM NETWORK (IN3)

[More]

To accelerate the response to emerging drug abuse trends, this recently awarded, NIH-funded study (9/15/14 – 9/14/15) is designed to establish iN3, an innovative NIDA National Early Warning System Network that will rapidly identify, evaluate, and disseminate information on emerging drug use patterns. Two synergistic data streams will be used to identify emerging patterns of drug use. The first data stream will be derived from the Toxicology Investigators Consortium ('ToxIC'), a network of medical toxicologists who specialize in recognizing and confirming sentinel events involving psychoactive substances. ToxIC investigators are located at 42 sites across the U.S, and of these, we have selected 11 to serve as sentinel surveillance sites. The research team will analyze reports from ToxIC investigators’ assessments of patients with acute, subacute, and chronic effects of emerging drug use. The second involves measures of drug use derived from social media (Twitter feeds and web forums).

NMR-based metabolomics analyses of biofluids (urine, plasma, sera) in relation to organ toxicity

[More]


PROJECT SAFE NEIGHBORHOOD (PSN)

[Overview, Details]

Project Safe Neighborhood is an interdisciplinary project involving the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) – Wright State University with other community partners including the City of Dayton (Dayton Police Department), and Montgomery County Juvenile Justice and University of Dayton to prevent juvenile repeat offenders from committing crime in the Westwood neighborhood located in the City of Dayton, Ohio.

Scale-Up in the Computer Science Core

[More]


Sirius Games: Heuristica

[More]


Studies of brain phospholipid metabolism and effects of oxidative stress

[More]


Studies of estrogenic endocrine disruptors (EEDs) using gene arrays, NMR metabolite analyses, and clinical chemistry

[More]


TOWARDS UNDERSTANDING AND MITIGATING THE IMPACT OF WEB ROBOT TRAFFIC ON WEB SYSTEMS

[More]

Wide-area imagery (WAMI) sensors are systems able to capture high resolution visualizations of large areas at high resolution over time. Discoveries about the extraordinarily large area WAMI captures at moderate rates is limited by a lack of tools to automatically infer, detect, and learn about patterns of life in the monitored area. This project will address this concern by building software tools and analyses methods based on the theory of temporal networks to discover the positional, temporal, and dynamic characteristics of broad geographic region captured by WAMI to improve situational awareness. Whereas the current art focuses on inferring data about single target(s) of interest, this work will build models capturing the broad dynamics of an entire region.

Twitris

[Overview, Details]

Twitris 2.0, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties. Twitris 2.0 also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system's integration and analysis abilities.

User Studies on Trustworthy Collaborative Systems

[More]

The project addresses the perception of trust by users, the appropriateness of a trust-based security approach and the role of trust metrics in the management of distributed work. The main challenge of this project is how to measure trust based on user behaviour and to verify by means of experimental studies with users that the trust-based mechanism is acceptable by users. We plan to apply this trust-based mechanism for two types of applications. The first one is collaborative editing where user trust will be computed based on the quality of user contributions for a document or project. The second type of application is in the management of work over a large group of people in order to conduct efficient, high-yield, high-density real time crowdsourcing activities. Partners of USCOAST2 project have complementary expertise. COAST provides expertise in collaborative methods, systems and related technologies. Coast will propose algorithms that track and manipulate trust metrics. Kno.e.sis provides expertise on the analysis of human work-related behavior, including methods of data collection and data analysis, as well as a theoretical foundation for the evaluation of human performance. Knoesis will analyse trust from a psychological phenomenon point of view.


Completed Projects

3D Computer Games

[More]

Computer games are in a sense an example of virtual environments. In order to facilitate a fully immersive experience, we developed computer games that support quad-buffered stereo. Combined with, for example, 3D-capable displays and active shutter glasses, these games provide a truly 3D experience. Similarly, existing games and game engines can be ported to support such 3D capabilities, such as Cube 2. With Cube 2 being open source, we adapted its game engine to support 3D stereo. The adapted version can be downloaded, which includes Windows and Linux binaries, as well as the source code.

A DECISION SUPPORT REASONER

[More]


CHARACTERIZATION AND ANALYSIS OF CODON USAGE BIAS

[More]

Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. We have developed novel techniques for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape.

CIRROCUMULUS

[Overview, Details]

The objective of Cirrocumulus is to develop a methodology for cloud application development and management at an abstract level by incorporating semantic enrichments at each phase of the applications lifecycle. This is intended to be achieved by using domain specific languages (DSL) for developing and configuring applications and introducing a middleware layer as a facade for core cloud services.

City Pulse

[More]

CityPulse provides innovative smart city applications by adopting an integrated approach to the Internet of Things and the Internet of People. The project will facilitate the creation and provision of reliable real-time smart city applications by bringing together the two disciplines of knowledge-based computing and reliability testing. Visit the project page to learn more.

CLOUDVISTA: INTERACTIVE VISUAL ANALYSIS OF LARGE DATA IN THE CLOUD

[More]

The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovative use of data cloud would provide unique opportunity to address this challenge. In this project, we propose the CloudVista framework to address (1) the problems caused by using sampling/summarization in the existing approaches and (2) the problems with the latency caused by cloud-side processing. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user's intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud.

CRESP: TOWARDS OPTIMAL CLOUD RESOURCE PROVISIONING FOR LARGE SCALE DATA INTENSIVE PARALLEL PROCESSING PROGRAMS

[More]

Hadoop/MapReduce has been a top choice for big data analysis in the cloud. While the elasticity and economics of cloud computing are attractive, there is no effective tool for scientists to deploy MapReduce programs with their requirements on time and budget satisfied, or with energy consumption minimized. We propose an analysis framework that aims to efficiently learn the closed-form cost model for any specific MapReduce program. This framework includes a robust regression method learning closed-form cost models from small-scale settings, the component-wise cost-variance analysis and reduction, and a fast approximate model learning method based on the model library.

DEVELOPMENT OF AN UNDERGRADUATE DATA MINING COURSE

[More]


DICOY: DISTRIBUTED COMPUTING FOR THE WEB ONTOLOGY LANGUAGE

[More]


DIFFUSE CORONARY ARTERY DISEASE DETECTION

[More]

The general objective of this project is to develop a novel rationale for diagnosis of diffuse coronary artery disease (DCAD) using clinical non-invasive imaging of the coronary arteries. The indices of diagnosis will be validated in studies of an atherosclerotic porcine model with DCAD. Our unique algorithms for accurately extracting morphometric data from computerized tomography angiography (CTA) images of normal and disease patients along with our quantitative approach uniquely position us to undertake this research.

DIRECT LOSS MINIMIZATION FOR CLASSIFICATION AND RANKING PROBLEMS

[More]


EAGER: KNOWLEDGE TRANSFER ORIENTED DATA MINING WITH FOCUS ON THE DECISION TREES KNOWLEDGE TYPE

[More]

This project is to study knowledge transfer oriented data mining (or KTDM). Given two data sets, the idea of KTDM is to discover models that are common to both data sets, as well as models that are unique in one data set. These common and unique models with respect to the two data sets will provide a tool to leverage the already-understood properties of one data set for the purpose of understanding the other, probably less understood, data set. This EAGER project is to concentrate on models in the form of a diversified set of classification trees. The KTDM approach is useful for real-world applications in part due to its ability to allow users to narrow down to particular models, guided by known knowledge from another data set. It will help towards realizing transfer of knowledge and learning in various domains. The project will support a graduate student and will seek collaboration with experts in the medical domain. These will increase the impact of the project. This supplementary paper contains supplementary information about shared decision trees mined from various pairs of datasets, including 3 microarray gene expression datasets for cancer and 3 microarray gene expression datasets for cancer treatment outcome.

EARLY LUNG DISEASE DETECTION ALLIANCE

[More]

The Cleveland Clinic Foundation and its partners, Riverain Medical, Wright State University and University Hospitals Health System, have joined together to form the Early Lung Disease Detection Alliance (ELDDA), a multidisciplinary research and commercialization program that will develop, test (through clinical trials), and bring to market new image-analysis systems that permit the early detection of lung cancer and other lung diseases. This computer-aided detection (CAD) system will be applied to the most widely available and used imaging exam, the chest x-ray. The fight against lung cancer is waged on three major fronts: prevention, detection and treatment. The goal of this collaboration is to detect disease at an early stage (i.e. stage I for lung cancer), a necessary step to improve the treatment and survival of lung cancer patients and those at risk for lung cancer throughout Ohio.

ENHANCEMENT AND cOMMERCIALIZATION OF COMPARATIVE WEB SEARCH TECHNOLOGIESA

[More]


ERRO: EFFICIENT REASONING WITH RULES AND ONTOLOGIES

ERRO is an FCT-funded project aimed at addressing the problem of effectively and efficiently reason with knowledge available on the Semantic Web, integrating ontological knowledge and deductive rules.

ESQUILO - EXPRESSIVE SCALABLE QUERYING OVER INTEGRATED LINKED OPEN DATA

[More]

ESQUILO develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment.

FANTOM: VECTOR FIELD VISUALIZATION

[More]

FAnToM (Field Analysis using Topological Methods) is a software system that allows a user to explore vector fields by applying different analysis and visualization algorithms. Among other algorithms, it is capable of analyzing the topology of a 2-D or 3-D vector field, including complex structures, such as closed streamlines. This greatly helps a user to comprehend the structure of complex vector fields which could not be achieved by traditional visualization methods.

FEDERATED SEMANTIC SERVICES PLATFORM FOR OPEN MATERIALS SCIENCE AND ENGINEERING

[More]

Materials data and information are essential for the design of any tangible product. There are a large number of materials handbooks and databases supporting various activities involved in materials development. Millions of publications contain information utilized by scientists, engineers, designers, and other consumers of materials data and information. This “Big Data” has created both challenges and opportunities. The White House’s Materials Genome Initiate (MGI) seeks to substantially improve the process of new material discovery and development, and shorten the time to deployment. Two of the core components of this initiative - new and sophisticated computer modeling technologies and next-generation experimental tools - received initial federal research support through 2012. The third major component is that of developing solutions for broader access to scientific data about materials to aid in achieving the goal of faster development of new materials at lower costs. Our approach recognizes the need for providing easy access to large amounts of highly distributed and heterogeneous data – including unstructured (scientific literature or publications), semi-structured and structured data. We recognize the need to support a variety of data as well as resources that provide data using APIs and Web services. We recognize the need for tools to be able to easily exchange data. We also recognize the need for integrated provenance (i.e., data lineage) to support data quality and relevance, and access control for organizations to share information when desired and yet keep valuable intellectual property confidential.

FORENSIC DNA RESEARCH

[More]

PCR-based amplification of STR loci has become the method of choice for the purpose of human identification in forensic investigations. With these loci, length polymorphisms associated with differences in the number of tandem repeats of four-nucleotide (tetranucleotide) core sequences are detected after polymerase chain reaction (PCR) amplification. A set of thirteen STR loci are typically genotyped with commercially available kits and length polymorphisms are identified with machines such as the Applied Biosystems 310 or 3100 capillary electrophoresis systems. In the analysis and interpretation of DNA evidence using STRs, a surprising number of technical, statistical, and computational issues emerge. Together with Forensic Bioinformatics Services, Inc., we investigate algorithmic, empirical, and statistical approaches to address many of these problems. The end goal of our research is to ensure that DNA evidence is treated with due scientific objectivity in the courtroom.

GENOME RESEARCH INFRASTRUCTURE PARTNERSHIP, Biotechnology Research and Technology Transfer (BRTT)

[More]


GEOMETRIC DATA PERTURBATION FOR PRIVACY-PRESERVING DATA CLASSIFICATION

[More]

This project investigates a random-geometric-transformation based data-perturbation approach for privacy preserving data classification. The goal of this perturbation approach is two-fold: preserving the utility of data in terms of classification modeling, and preserving the privacy of data. To achieve the first goal, we identify that many classification models utilize the geometric properties of datasets, which can be preserved by geometric transformation. We prove that the three types of well-known classifiers will deliver the same (or very similar) performance over the geometrically perturbed dataset as over the original dataset. As a result, this perturbation approach guarantees almost no loss of accuracy for three popular classification methods. To reach the second goal, we propose a multi-column privacy model to address the problems of evaluating privacy quality for multidimensional perturbation, and develop an attack-resilient perturbation optimization method. We analyze three types of inference attacks: naive estimation, ICA-based reconstruction, and distribution-based attacks with the proposed privacy metric. Based on the attack analysis, a randomized optimization method is developed to optimize perturbation. Our initial experiments show that this approach can provide high privacy guarantee while preserving the accuracy for the discussed classifiers.

HPCO: HUMAN PERFORMANCE AND COGNITION ONTOLOGY

[More]

The project involves extending our work in focused knowledge (entity-relationship) extraction from scientific literature, automatic taxonomy extraction from selected community authored content (e.g. Wikipedia), and semi-automatic ontology development with limited expert guidance. These are combined to create a framework that will allow domain experts and computer scientists to semi-automatically create knowledge bases through an iterative process. The final goal is to provide superior (both in quality and speed) search and retrieval over scientific literature for life scientists that will enable them to elicit valuable information in the area of human performance and cognition.

IDENTIFICATION OF BIOMARKERS OF TOXICITY AND DOWNSTREAM OUTCOMES

[More]

Metabolomics is the exhaustive characterization of metabolite concentrations in biofluids and tissues. The use of NMR and chromatography-linked mass spectrometry to assay metabolic profiles of tissue homogenates and biofluids has been increasingly recognized as a powerful tool for biological discovery. In recent years metabolomics techniques have been applied to a wide variety of diagnostic, preclinical, systems biology, and ecological studies. Working with Dr. Nick Reo's NMR spectroscopy lab at Wright State University, we are developing standards-based tools and web services for the pre-processing, normalization/standardization, exploratory and comparative analysis, and visualization of NMR spectra from biofluids.

INSTRUMENTATION OF A HIERARCHICHAL WIRELESS SENSOR NETWORK TEST-BED FOR RESEARCH AND EDUCATION

[More]


KNO.E.COM: KNOWLEDGE-ENABLED CONTENT MANAGEMENT

[More]


LARGE-SCALE DISTRIBUTED SYNTACTIC, SEMANTIC AND LEXICAL LANGUAGE MODELS

[More]

We aim to build large scale distributed syntactic, semantic, and lexical language models that are trained by corpora with up to web-scale data on a supercomputer to substantially improve the performance of machine translation and speech recognition systems. It is conducted under the directed Markov random field paradigm to integrate both topics and syntax to form complex distributions for natural language. It uses hierarchical Pitman-Yor processes to model long tail properties of natural language. By exploiting the particular structure, the seemingly complex statistical estimation and inference algorithms are decomposed and performed in a distributed environment. Moreover, a long standing open problem, smoothing fractional counts due to latent variables in Kneser-Ney's sense in a principled manner, might be solved. We demonstrate how to put the complex language models into one-pass decoders of machine translation systems, and lattice rescoring decoder in a speech recognition system.

LARGE-SCALE VISUALIZATION OF ARTERIAL TREES

[More]

Current CT scanner allow the retrieval of vessel only up to a certain point due to the limited resolution. Recent techniques developed by Benjamin Kaimovitz et al. allow the extension of such scans down to the vessels at the capillary level, resulting in a model of the entire arterial vasculature. Of course, such a model is enormous in size challenging the visualization. We implemented a visualization software that is capable of handling a model with several GBs in size, exceeding the main memory of desktop computers. The software is highly optimized for tree shaped geometrical objects to achieve the best rendering performance possible.

MATERIAL DATABASE KNOWLEDGE DISCOVERY AND DATA MINING (KDDM)

[More]

The Air Force Research Laboratory's Materials and Manufacturing Directorate (AFRL/RX) develops materials, processes, and manufacturing and sustainment technologies across the spectrum of aircraft, spacecraft and missile applications. However, there are few attempts that try to understand the full ramifications of using informatics in a more concerted manner for data management in the field of Material science. Knoesis Center with the collaboration with AFRL/RX applying knowledge and technology in informatics to the material domains, thus introducing the materials and process community to better data management practices. A data exchange system that will allow researchers to index, search, and compare data will enable a shortened transition cycle in material science.

Metadata for Timeline Events

[More]


MINING PRIVACY SETTINGS TO FIND OPTIMAL PRIVACY-UTILITY TRADEOFFS FOR SOCIAL NETWORK SERVICES

[More]

Privacy has been a big concern for users of social network services (SNS). On recent criticism about privacy protection, most SNS now provide fine privacy controls, allowing users to set visibility levels for almost every profile item. However, this also creates a number of difficulties for users. First, SNS providers often set most items by default to the highest visibility to improve the utility of social network, which may conflict with users' intention. It is often formidable for a user to fine-tune tens of privacy settings towards the user desired settings. Second, tuning privacy settings involves an intricate tradeoff between privacy and utility. When you turn off the visibility of one item to protect your privacy, the social utility of that item is turned off as well. It is challenging for users to make a tradeoff between privacy and utility for each privacy setting. We propose a framework for users to conveniently tune the privacy settings towards the user desired privacy level and social utilities. It mines the privacy settings of a large number of users in a SNS, e.g., Facebook, to generate latent trait models for the level of privacy concern and the level of utility preference. A tradeoff algorithm is developed for helping users find the optimal privacy settings for a specified level of privacy concern and a personalized utility preference. We crawl a large number of Facebook accounts and derive the privacy settings with a novel method. These privacy setting data are used to validate and showcase the proposed approach.

MOBICLOUD

[More]

The objective of the MobiCloud project is to provide a singular approach to address the challenges of the heterogeneity of the multitude of existing clouds as well the multitude of mobile applications. The MobiCloud project is based on a Domain Specific Language (DSL) based platform agnostic application development paradigm for cloud-mobile hybrid applications.

OBSERVATION OF EMERGENCY AIR MEDICAL TRANSPORT TRAINING

[More]


PHYLONT

[More]

In revealing historical relationships among genes and species, phylogenies provide a unifying context across the life sciences for investigating diversification of biological form and function. The utility of phylogenies for addressing a wide variety of biological questions is evident in the rapidly increasing number of published gene and species trees. Further, this trend is certain to pick up pace with the explosion of data being generated with next generation sequencing technologies. The impact that this deluge of species and gene tree estimates will have on our understanding of the forces that shape biodiversity will be limited by the accessibility of these trees, and the underlying data and methods of analysis. The true structure of species trees and gene trees is rarely known. Rather, estimates are obtained through the application of increasingly sophisticated phylogenetic inference methods to increasingly large and complicated datasets. The need for Minimum Information about Phylogenetic Analyses (MIAPA) reporting standard is clear, but specification of the standard has been hampered by the absence of controlled vocabularies to describe phylogenetic methodologies and workflows.

PREDOSE: PRESCRIPTION DRUG ABUSE ONLINE-SURVEILLANCE AND EPIDEMIOLOGY

[More]

The goal of PREDOSE is to develop automated data collection and analysis tools to process social media (tweets, web-forums) to understand the knowledge, attitudes, and behaviors of prescription-drug abusers, who misuse buprenorphine, OxyContin and other pharmaceutical opioids. Instead of relying on traditional epidemiological surveillance methods such as population surveys, or face-to-face interviews with drug-involved individuals, PREDOSE focuses on the web, which provides venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. Such User Generated Content (UGC) can be used as a very rich source of unsolicited, unfiltered and anonymous self-disclosures of drug use behaviors. The automatic extraction of such data enables qualitative researchers to overcome scalability limitations imposed by existing methods of qualitative studies.

PROTEIN STRUCTURE

[More]

Understanding protein structure and the forces that drive protein folding is one of the most fundamental and challenging problems in biochemistry. We are pursuing a number of projects that explore the determinants of protein structure and improve computational structure prediction methods. Our current areas of investigation include: Development of a novel technique for the identification of remote homologs, Characterization of secondary structure variability for protein sequences, and Hybrid experimental/computational methods for high-confidence prediction of protein tertiary and quaternary structure. The latter project involves improving the reliability of protein structure prediction algorithms by including experimental information in the model selection process. In collaboration with Dr. Jerry Alter's lab, (Department of Biochemistry and Molecular Biology, Wright State University) we have developed the computational support for MRAN - Modification Reactivity Analysis (see figure above). Based upon the reaction rate of proteolysis or residue modification reactions, solvent accessibility and other physicochemical properties of specific residues can be estimated. This information can then be used to drive the process of selecting and refining conformational models for further exploration.

QUERY OPTIMIZATION IN RECONFIGURABLE COMPUTING SYSTEMS

[More]


RASP: RANDOM SPACE ENCRYPTION FOR EFFICIENT MULTIDIMENSIONAL RANGE QUERY ON ENCRYPTED DATABASES

[More]

With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be so sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We summarize these key features for hosting a query service in the cloud as the CPEL criteria: data Confidentiality, query Privacy, Efficient query processing, and Low in-house processing cost. Bearing the CPEL criteria in mind, we propose the RASP data perturbation method to provide secured range query and kNN query services for the data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, which provides strong resilience to attacks on the perturbed data. The RASP perturbation preserves the multidimensional ranges for query, which allows existing indexing techniques such as RTree to be applied in query processing. Range query processing is conducted in two stages: query on the bounding box of the transformed range and filter out irrelevant results with secured conditions. Both stages can be done in the cloud with exact results returned to the client, which guarantees the EL criteria of CPEL. The kNN-R algorithm is designed to work with the RASP range query algorithm to efficiently process the kNN queries. We also carefully analyzed the attacks on data and queries under the precisely defined threat model. Extensive experiments are conducted to show the advantages of this approach on the CPEL criteria.

RECONSTRUCTION OF DRAGONFLY TAKE-OFF

[More]


SA-REST

[More]

SA-REST is a format to add additional metadata to (but not limited to) REST API descriptions in HTML or XHTML. Meta-data from various models such an ontology, taxonomy or a tag cloud can be embedded into the documents. This embedded metadata permits various enhancements, such as improve search, facilitate data mediation, and easier integration of services.

SECURE KNOWLEDGE MANAGEMENT

[More]


SEM: SEMANTICS-ENABLED EDITORIAL MANAGEMENT

[More]


SEMANTIC SENSOR WEB

[More]

Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these networks, however, often leaves this avalanche of data stovepiped and intensifies the existing problem of too much data and not enough knowledge. With a view to alleviating this glut, we propose that sensor data be annotated with semantic metadata to provide contextual information essential for situational awareness. This research was supported by The Dayton Area Graduate Studies Institute (DAGSI), AFRL/DAGSI Research Topic SN08-8: Architectures for Secure Semantic Sensor Networks for Multi-Layered Sensing.

SEMANTICS AND SERVICES-ENABLED PROBLEM SOLVING ENVIRONMENT FOR TRYPANOSOMA CRUZ

[More]

The study of complex biological systems increasingly depends on vast amounts of dynamic information from diverse sources. The scientific analysis of the parasite Trypanosoma cruzi (T.cruzi), the principal causative agent of human Chagas disease, is the driving biological application of this proposal. Approximately 18 million people, predominantly in Latin America, are infected with the T.cruzi parasite. As many as 40 percent of these are predicted eventually to suffer from Chagas disease, which is the leading cause of heart disease and sudden death in middle-aged adults in the region. Research on T. cruzi is therefore an important human disease related effort. It has reached a critical juncture with the quantities of experimental data being generated by labs around the world, due in large part to the publication of the T.cruzi genome in 2005. Although this research has the potential to improve human health significantly, the data being generated exist in independent heterogeneous databases with poor integration and accessibility. The scientific objectives of this research proposal are to develop and deploy a novel ontology-driven semantic problem-solving environment (PSE) for T.cruzi. This is in collaboration with the National Center for Biomedical Ontologies (NCBO) and will leverage its resources to achieve the objectives of this proposal as well as effectively to disseminate results to the broader life science community, including researchers in human pathogens. The PSE allows the dynamic integration of local and public data to answer biological questions at multiple levels of granularity. The PSE will utilize state-of-the-art semantic technologies for effective querying of multiple databases and, just as important, feature an intuitive and comprehensive set of interfaces for usability and easy adoption by biologists. Included in the multimodal datasets will be the genomic data and the associated bioinformatics predictions, functional information from metabolic pathways, experimental data from mass spectrometry and microarray experiments, and textual information from Pubmed. Researchers will be able to use and contribute to a rigorously curated T.cruzi knowledge base that will make it reusable and extensible. The resources developed as part of this proposal will be also useful to researchers in T.cruzi related kinetoplastids, Trypanosoma brucei and Leishmania major (among other pathogenic organisms), which use similar research protocols and face similar informatics challenges.

SEMANTICS-DRIVEN ANALYSIS OF SOCIAL MEDIA

[More]

Over the last few years, there has been a growing public fascination with 'social media' and its role in modern society. At the heart of this fascination is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social networking sites etc. Our research primarily focuses on the analysis of various aspects of User-Generated Content (UGC) that are central to understanding interpersonal communication on social media. More recently, our interdisciplinary collaboration is studying People-Content-Network analysis. The objective of our work on semantic content analysis is to bring structure and organization to unstructured chatter on social media for what, why and how users write content. What are the dynamics of evolution of interactions among these users, how they are affected by sentiments, opinions and how such dynamics changes in real time. We address these various facets in multiple sub-project under this one umbrella.

SEMI-SUPERVISED STRUCTURED PREDICTION

[More]


SEMPHYL: USING SEMANTIC TECHNOLOGY IN PHYLOGENY AND PHYLOINFORMATICS

[More]

The specific objectives of this research are to develop and deploy a novel ontology-driven semantic problem solving in phylogeny analysis. To annotate context in phylogeny and make a foundation to allow the dynamic integration of local and public data to answer phylogenetic questions at multiple levels of granularity.

SOCS: ORGANIZATIONAL SENSEMAKING DURING EMERGENCY RESPONSE

[More]

Online social networks and always-connected mobile devices have created an immense opportunity that empowers citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many isolated examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. However, there are few attempts that try to understand the full ramifications of using social networks in a more concerted manner for effective organizational sensemaking in such contexts. This multidisciplinary project, spanning computational and social sciences, seeks to fill this gap.

TENSOR FIELD VISUALIZATION

[More]

The analysis and visualization of tensor fields is an advancing area in scientific visualization. Topology based methods that investigate the eigenvector fields of second order tensor fields have gained increasing interest in recent years. To complete the topological analysis, we developed an algorithm for detecting closed hyper-streamlines as an important topological feature.

TOXICITY BIOMARKER DISCOVERY FOR MICROARRAY GENE EXPRESSION TIME SERIES DATA

[More]


TRON: TRACTABLE REASONING WITH ONTOLOGIES

[More]

The Semantic Web is based on describing the meaning - or semantics - of data on the Web by means of metadata - data describing other data - in the form of ontologies. The World Wide Web Consortium (W3C) has made several recommended standards for ontology languages which differ in expressivity and ease of use. Central to these languages is that they come with a formal semantics, expressed in model-theoretic terms, which enables access to implicit knowledge by automated reasoning. Progress in the adoption of reasoning for ontology languages in practice is currently being made, but several obstacles remain to be overcome for wide adoption on the Web. Two of the central technical issues are scalability of reasoning algorithms, and dealing with inconsistency of the ontological knowledge bases. These two issues are being addressed in this project. The scalability issue has its origin in the fact that the expression of complex knowledge requires sophisticated ontology languages, like the Web Ontology Language OWL, which are inherently difficult to reason with - as witnessed by high computational complexities, usually ExpTime or beyond. This project builds on recent new developments in polynomial time languages around OWL in order to remedy this. In particular, in this project efficient algorithmizations and tools are developed for the largest currently known polynomial-time ontology language, called SROELVn. Reasoning with knowledge bases with expressivity beyond SROELVn is enabled through approximating these knowledge bases within SROELVn. The inconsistency issue has its origin in the fact that large knowledge bases, in particular on the web, are usually not centrally engineered, but arise out of the merging of different knowledge bases with different underlying perspectives and rationales. In this project tools are developed for efficient, i.e., polynomial-time reasoning with inconsistent ontologies. The concrete outcome of the project is an open source reasoning system which is able to reason efficiently with (possibly) inconsistent knowledge bases around OWL, in at least an approximate manner.

Understanding User Interactions in Large-Scale Online Emotional Support Systems

[More]

Internet and online-based social systems are rising as the dominant mode of communication in society. However, the public or semi-private environment under which most online communications operate under do not make them suitable channels for speaking with others about personal or emotional problems. This project partners with 7 Cups of Tea Inc., the company that runs the world's leading online platform that offers safe, anonymous, live one-on-one support to understand how and why people needing emotional help harness a crowd of active listeners to find support.

Virtual Environments

[More]

Virtual environments for presenting a specific model, such as an architectural design, or for repetitive testing in which subjects need to be exposed to a specific scenario can be a valuable tool. In the latter case it is of utmost importance that the scenario is exactly identical for every subject. The different displays in the Appenzeller Visualization laboratory combined with the available software provide the perfect basis for these environments.

VISUALIZATION OF VASCULAR STRUCTURES

[More]

Cardiovascular diseases, such as atherosclerosis and coronary artery disease, are high risk factors for cardiac pain and death. We implemented a visualization software that enables interactive 3-D visualization of the cardiac vasculature retrieved using CT scanning technology, and an interactive flight through the vessel. Bifurcation angles and radii of the vessels can be measured while exploring the tree. Areas of high risk that could cause potential problems can be identified by this method. The project is conducted in collaboration with Dr. Ghassan Kassab's lab at the Department of Biomedical Engineering at the Indiana University Purdue University, who provided the data set.