|Title||Knowledge Discovery in Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm|
|Publication Type||Journal Article|
|Year of Publication||2003|
|Authors||Michael Raymer, Leslie Kuhn, William Punch|
|Keywords||Bayes classifier, curse of dimensionality, feature extraction, feature selection, genetic algorithms, pattern classification, protein solvation|
A key element of many bioinformatics research problems is the extraction of meaningful information from large experimental data sets. Various approaches, including statistical and graph theoretical methods, data mining, and computational pattern recognition, have been applied to this task with varying degrees of success. We have previously shown that a genetic algorithm coupled with a K nearest-neighbors classifier performs well in extracting information about protein-water binding from X-ray crystallographic protein structure data. Using a novel classifier based on the Bayes discriminant function, we present a hybrid algorithm that employs feature selection and extraction to isolate the salient features from large biological data sets. The effectiveness of this algorithm is demonstrated on various data sets, including an important problem in proteomics and protein folding – prediction of water binding sites near a protein surface.
|Full Text|| |