Affiliations: Department of Computer Science, Georgia State University, Atlanta, GA, USA, E-mails: {serghei, acaciula, glebova, alexz}@cs.gsu.edu | Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA, E-mail: [email protected]
Note: [] Corresponding author: Serghei Mangul, Department of Computer Science, Georgia State University, Atlanta, GA, USA. Tel.: +1 404 645 0479; Email: [email protected].
Abstract: The paper addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. We present a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compare it with existing annotation-guided and genome-guided transcriptome assembly methods. Our method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks [3], as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.
Keywords: Transcriptome reconstruction and quantification, RNA-Seq, next generation sequencing, expectation maximization