SubCellProt: Predicting Protein Subcellular Localization Using Machine Learning Approaches

Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

doi:10.3233/ISB-2009-0384

SubCellProt: Predicting Protein Subcellular Localization Using Machine Learning Approaches

Article type: Research Article

Authors: Garg, Prabha | Sharma, Virag | Chaudhari, Pradeep | Roy, Nilanjan^;

Affiliations: Center for Pharmacoinformatics, National Institute of Pharmaceutical Education and Research S.A.S. Nagar, Sector 67, S.A.S Nagar, Punjab 160 062, India | Department of Biotechnology, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, India

Note: [] Corresponding author. E-mail: [email protected]

Abstract: High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.

Keywords: Protein function, subcellular localization, machine learning, PNN, kNN

DOI: 10.3233/ISB-2009-0384

Journal: In Silico Biology, vol. 9, no. 1-2, pp. 35-44, 2009

Received 23 August 2008

Accepted 3 December 2008

Published: 2009

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia