Abstract: An important concern in the attempt of understanding the functional
code of eukaryotic genes is to elucidate the control structures for regulating
gene activation and suppression. One objective in the attempt to understand
mechanisms of gene regulation is the elucidation of the regulatory network
structure. A preliminary step of a detailed network analysis is identifying the
transcription factor binding sites of a regulatory network. Known as
cis-regulatory module (CRM), it is understood as part of the genome that
comprises a set of short length binding sites. Gene regulatory systems are
known to be quite stable during evolution, as compared to relatively frequent
replication processes of genes and mutations of the coding sequences. This
conservation property of regulatory code can advantageously be used for
identifying cis-regulatory modules of potentially co-regulated genes. As
the degree of similarity is expected to depend on the phylogenetic distance of
homologs or orthologs, we favor an approach that is based on a comparison
paradigm. The paper introduces a novel concept for measuring the similarity of
cis-regulatory modules which can then be used in an algorithm for
comparing regulatory regions. The proposed algorithm searches for pairs of
similar modules, and a prototype implementation is applied to human and mouse
liver sequences. The results are compared to that of random sequences, and it
is shown that a clear decision about co-regulation is possible at this level.
Keywords: Regulatory networks, cis-regulatory modules, phylogenetic distance, similarity of sequences