Composition-Sensitive Analysis of the Human Genome for Regulatory Signals
Issue title: 3rd International Conference on Bioinformatics of Genome Regulation and Structure (BGRS 2002), June 2002, Novosibirsk, Russia
Article type: Research Article
Authors: Kel-Margoulis, Olga V. | Tchekmenev, Dmitri | Kel, Alexander E. | Goessling, Ellen | Hornischer, Klaus | Lewicki-Potapov, Birgit | Wingender, Edgar;
Affiliations: BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenb¨ttel, Germany | Department of Bioinformatics, UKG, Georg-August-University Göttingen, Goldschmidtstr. 1, D-37077 Göttingen, Germany
Note: [] Corresponding author. E-mail: [email protected]
Abstract: Known transcription regulatory signals which generally act as transcription factor binding sites (TFs) differ significantly in their base composition. Therefore, their occurrence in a genome largely depends on the local base composition. In an attempt to initiate an all human genome analysis for the occurrence of potential TFs, we systematically analyzed the GC-content of distinct functional regions (e.g., upstream and downstream gene regions, exons, long and short introns, repetitive elements) and correlated the frequencies of potential binding sites of a representative set of TFs in these regions. For these analyses, we used the pattern collection of the TRANSFAC® database on transcriptional regulation, the information about functionally relevant combinations of them from the database TRANSCompel®, and our new resource, TRANSGenome™, which provides an overall annotation of the human genome with emphasis on its regulatory characteristics. We show that the occurrence of sequence patterns with regulatory potential may be supported by, but cannot be fully explained by either the GC content of a whole chromosome or its putative promoter regions, nor by the information content of the patterns. Several patterns, HNF-3, NFAT, and GC box, show a clear overrepresentation in all promoter groups as well as in all chromosomes. Other patterns, like E2F and CRE-BP1, are underrepresented in all promoter groups as well as in all chromosomes in comparison with random sequences. Simultaneously, both patterns are over-represented in promoters in comparison with repetitive elements. We define several structural characteristics of the proximal promoters that differentiate them from other functional genomic regions. Two well-known promoter elements, GC- and TATA-boxes, are statistically enriched in promoters in comparison with random sequences, repetitive elements and exons. Altogether, our findings provide insights into the macroheterogeneity amongst the individual chromosomes, into the microheterogeneity among different functional regions of individual chromosomes, contribute to further understanding of structural organization of gene regulatory regions, and give first hints on the development of regulatory features during evolution.
Keywords: human genome, transcription factor binding sites, computational analysis, gene regulation, promoters, repetitive elements
Journal: In Silico Biology, vol. 3, no. 1-2, pp. 145-171, 2003