LyM: A Tool to Reach the Best Factor in Gene Expression Comparison
Article type: Research Article
Authors: de Souza Peres, Tarcísio | Costa, Fernando Ferreira | Alberto, Fernando Lopes;
Affiliations: Hemocenter of the State University of Campinas. Cidade Universitária "Zeferino Vaz", distrito de Barão Geraldo, PO Box 6111, ZIP code 13083-970 Campinas, SP, Brazil | Department of Internal Medicine, School of Medicine of the State University of Campinas – UNICAMP. PO Box 6111, ZIP code 13083-970 Campinas, SP, Brazil | Fleury Institute, Molecular Biology. Av Gen Valdomiro de Lima, 508, ZIP code 04344-070 São Paulo, SP, Brazil
Note: [] Corresponding author. Tel.: +5511 92224593; Fax: +5519 32891089; E-mail: [email protected]
Abstract: We developed a Perl-based tool called LyM to determine the best factor for changes in the expression level for each transcript across two sets of expression libraries. LyM includes a Bayesian framework that analyzes the prior and posterior probability density function for each transcript considering the size of the libraries. To find out the best factor for change in each distribution, LyM was implemented with a binary search. In this work we aimed to validate the performance of LyM tool using SAGE libraries from different human tissues. The results were compared with those generated by DGED (Digital Gene Expression Displayer), which worked as the gold standard, on the same data set, to assess accuracy. SAGE libraries were selected from CGAP for the following tissues (normal versus tumor): breast, colon, lung and stomach, consisting of eight SAGE libraries and 381,569 tags. DGED analyses were performed with five arbitrary factors for gene expression in two expression libraries: 2, 4, 8, 16 and 32. The results were confronted using the ratio between LyM and DGED factors and were quantitatively well-matched. LyM was capable of retrieving the best value of F, a factor that represents the fold difference in the expression of a specific gene between two expression libraries, represented by its SAGE tags. However, the optimal value of F is only shown in DGED output after multiple manual interactions. As a result, there was a significant economy of time with the LyM binary search algorithm. In some anecdotal cases we observed that the differential expression levels reached values above 100-fold for a fixed value of P=0.05, an information that initially remained hidden in DGED. Finally, LyM proved to be relatively fast, portable to the standard workstation present in the molecular biology laboratory, assisting accurate and convenient gene search in expression experiments with minimal user interactions.
Keywords: Gene expression, differential expression factor, Bayesian framework, binary search
Journal: In Silico Biology, vol. 7, no. 1, pp. 101-104, 2007