Affiliations: Department of Mathematics and Statistics, University
of Helsinki, Helsinki, FI-00014, Finland | Institute of Biotechnology, University of Helsinki,
Helsinki, FI-00014, Finland
Abstract: A Naive Bayes classifier tool is presented for annotating proteins
on the basis of amino acid motifs, cellular localization and protein-protein
interactions. Annotations take the form of posterior probabilities within the
Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the
data available for yeast, Saccharomyces cerevisiae, show that our
prediction method can yield a relatively high level of accuracy. Several
apparent challenges and possibilities for future developments are also
discussed. A common approach to functional characterization is to use sequence
similarities at varying levels, by utilizing several existing databases and
local alignment/identification algorithms. Such an approach is typically quite
labor-intensive when performed by an expert in a manual fashion. Integration of
several sources of information is in this context generally considered as the
only possibility to obtain valuable predictions with practical implications.
However, some improvements in the prediction accuracy of the molecular
functions, and thereby also savings in the computational effort, can be
achieved by restricting attention to only those data sources that involve a
higher degree of specificity. We employ here a Naive Bayes model in order to
provide probabilistic predictions, and to enable a computationally efficient
approach to data integration.
Keywords: Protein function prediction, Naive Bayes, data integration, Gene Ontology