You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Characterization and prediction of mRNA alternative polyadenylation sites in rice genes

Abstract

Polyadenylation [poly(A)] of mRNA is a critical step during gene expression, which plays an important role in the termination of transcription. Prediction of poly(A) sites can help identify 3' ends of genes and improve genome annotation. Due to the limited knowledge of poly(A) signals in plants, predictive modeling of poly(A) sites in agricultural crops remains challenging. Recent studies have uncovered widespread occurrences of alternative poly(A) (APA) sites in intron and coding sequence (CDS), whereas the study on the prediction of these APA sites is scarce. In this study, four feature representation methods, involving a position weight matrix, the k-gram frequency, core hexamers, and a transition matrix, were adopted to characterize poly(A) signals surrounding APA sites. The classification model was built to predict each group of APA sites. Experimental results showed that this model was effective in the identification of APA sites located in different genomic regions, with a compromise between sensitivity and specificity higher than 87%. Compared with previous model PASS rice, accuracies for the prediction of APA sites in 3'-UTR, intron and CDS were enhanced by 5%, 7%, and 27%, respectively. This model will contribute to genetic engineering by enabling researchers to control poly(A) site selection.