An Overview on Predicting the Subcellular Location of a Protein

Feng, Zhi-Peng

An Overview on Predicting the Subcellular Location of a Protein

Issue title: This is the special Issue The German Conference on Bioinformatics 2001, 7-10 October, Braunschweig, Germany

Article type: Research Article

Authors: Feng, Zhi-Peng^;

Affiliations: Department of Physics, Institute of Science, Tianjin University, Tianjin 30072, China, Fax: +86 22 8789 0061 | LiuHui Center for Applied Mathematics, Nankai University and Tianjin University, Tianjin 300072, China, Email: [email protected]

Abstract: The present paper overviews the issue on predicting the subcellular location of a protein. Five meas-ures of extracting information from the global sequence based on the Bayes discriminant algorithm are reviewed. 1) The auto-correlation functions of amino acid indices along the sequence; 2) The quasi-sequence-order approach; 3) the pseudo-amino acid composition; 4) the unified attribute vector in Hilbert space, 5) Zp parameters extracted from the Zp curve. The actual performance of the predictive accuracy is closely related to the degree of similarity be-tween the training and testing sets or to the average degree of pairwise similarity in dataset in a cross-validated study. Many scholars considered that the current higher predictive accuracy still cannot ensure that some available algorithms are effective in practice prediction for the higher pairwise sequence identity of the datasets, but some of them declared that construction of the dataset used for developing software should base on the reality determined by the Mother Nature that some subcellular locations really contain only a minor number of proteins of which some even have a high percentage of sequence similarity. Owing to the complexity of the problem itself, some very so-phisticated and special programs are needed for both constructing dataset and improving the prediction. Anyhow finding the target information in mature protein sequence and properly cooperating it with sorting signals in predic-tion may further improve the overall predictive accuracy and make the prediction into practice.

Keywords: subcellular location, N-terminal targeting sequences, sorting signals, targeting information, amino acid composition, quasi-sequence-order-effect, pseudo-amino acid composition, auto-correlation functions, unified attribute vector, Zp curve, Zp parameters, Bayes discriminant algorithm, component-coupled algorithm, k-nearest neighbor method, hidden Markov model, neural networks, Support Vector Machine (SVM), jackknife test, hydro-phobicity, pairwise sequence similarity

Journal: In Silico Biology, vol. 2, no. 3, pp. 291-303, 2002

Received 22 November 2001

Accepted 18 December 2001

Published: 2002

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia