Purchase individual online access for 1 year to this journal.
Price: EUR N/A
ISSN 1386-6338 (P)
ISSN 1434-3207 (E)
In Silico Biology is a scientific research journal for the advancement of computational models and simulations applied to complex biological phenomena. We publish peer-reviewed leading-edge biological, biomedical and biotechnological research in which computer-based (i.e.,
) modeling and analysis tools are developed and utilized to predict and elucidate dynamics of biological systems, their design and control, and their evolution. Experimental support may also be provided to support the computational analyses.
In Silico Biology aims to advance the knowledge of the principles of organization of living systems. We strive to provide computational frameworks for understanding how observable biological properties arise from complex systems. In particular, we seek for integrative formalisms to decipher cross-talks underlying systems level properties, ultimate aim of multi-scale models.
Studies published in
In Silico Biology generally use theoretical models and computational analysis to gain quantitative insights into regulatory processes and networks, cell physiology and morphology, tissue dynamics and organ systems. Special areas of interest include signal transduction and information processing, gene expression and gene regulatory networks, metabolism, proliferation, differentiation and morphogenesis, among others, and the use of multi-scale modeling to connect molecular and cellular systems to the level of organisms and populations.
In Silico Biology also publishes foundational research in which novel algorithms are developed to facilitate modeling and simulations. Such research must demonstrate application to a concrete biological problem.
In Silico Biology frequently publishes special issues on seminal topics and trends. Special issues are handled by Special Issue Editors appointed by the Editor-in-Chief. Proposals for special issues should be sent to the Editor-in-Chief.
About In Silico Biology
is a pendant to
(in the living system) and
(in the test tube) biological experiments, and implies the gain of insights by computer-based simulations and model analyses.
In Silico Biology (ISB) was founded in 1998 as a purely online journal. IOS Press became the publisher of the printed journal shortly after. Today, ISB is dedicated exclusively to biological systems modeling and multi-scale simulations and is published solely by IOS Press. The previous online publisher, Bioinformation Systems, maintains a website containing studies published between 1998 and 2010 for archival purposes.
We strongly support open communications and encourage researchers to share results and preliminary data with the community. Therefore, results and preliminary data made public through conference presentations, conference proceeding or posting of unrefereed manuscripts on preprint servers will not prohibit publication in ISB. However, authors are required to modify a preprint to include the journal reference (including DOI), and a link to the published article on the ISB website upon publication.
Abstract: PhoH protein is a putative ATPase belonging to the phosphate regulon in Escherichia coli. EC-PhoH homologs are present in different organisms, but it is not clear if they are functionally related, besides nothing is known about their regulation. To distinguish true functional orthologs of EC-PhoH in different classes of bacteria and to identify their functional role in bacterial metabolic network we performed phylogenetic analysis of these proteins and comparative study of position and regulation of the…related genes. Three groups of proteins were identified. Proteins of the first group (BS-PhoH orthologs) are present in most of bacteria and are proposed to be functionally linked to phospholipid metabolism and RNA modification. Proteins of the second group (BS-YlaK orthologs) are present in most of aerobes, and Actinobacterial YlaK orthologs are shown to be members of a fatty acid beta-oxidation regulons. EC-PhoH orthologs are classified in a third group, specific for Enterobacteria. Functional role of PhoH homologs in the lipid and RNA metabolism and proposed interrelation of PhoH paralogs in one organism are discussed.
Abstract: A new approach for comparative analysis of multiple trees reconstructed for representative protein families is proposed. This approach is based on the hypothesis of gene duplication, gene loss and horizontal gene transfer and makes use of stochastic methods and optimization. We present a species tree of 40 prokaryotic organisms obtained by our algorithm on the basis of 132 clusters of orthologous groups of proteins (COGs) from the GenBank of the National Center for Biotechnology Information (USA).…We also present a computer technology intended to determine horizontally transferred genes. Some application results of the technology, based on comparative analysis of protein and species trees, are given.
Abstract: We describe an algorithm (IRSA) for identification of common regulatory signals in samples of unaligned DNA sequences. The algorithm was tested on randomly generated sequences of fixed length with implanted signal of length 15 with 4 mutations, and on natural upstream regions of bacterial genes regulated by PurR, ArgR and CRP. Then it was applied to upstream regions of orthologous genes from Escherichia coli and related genomes. Some new palindromic binding and direct repeats signals were…identified. Finally we present a parallel version suitable for computers supporting the MPI protocol. This implementation is not strictly bounded by the number of available processors. The computation speed linearly depends on the number of processors.
Abstract: There exist numerous algorithms for identification of regulatory signals in unaligned DNA fragments. Here we present two genetic algorithms for signal identification and describe their implementation and testing on simulated and real data. The first algorithm selects the start position of the signal in a given fragment. The second one builds a "universal" word that is recognized by the transcription factor. We compare these approaches and study the behavior of the genetic algorithm.
Abstract: The identification of regulatory elements in silico is an important method for inferring function from sequence data, but it is uncertain which methods are best. We used a novel combination of expression data from a TCF1 knockout mouse (TCF1 codes for the transcription factor HNF1α), and human and mouse genome sequences, to search 2kb upstream of 28 genes downregulated in TCF1 null mice compared to wild type mice. We wrote software (http://www.BindGene.org) to search for and…assign p-values to potential binding sites. This identified 8 genes as candidates for being directly regulated by HNF1α: LIPC, CRP, F13B, PRODH2, HSD17B2, SCL7A9, SLC16A7, PAH. There was evidence for conservation between human and mouse for all these regions identified as containing putative binding sites. For three of the genes identified there was experimental evidence for an HNF1α binding site. For comparison we also examined 25 genes up-regulated in TCF1 null mice; only one gene was selected and there was little evidence for conservation of this putative binding site between human and mouse. This result was consistent with HNF1α being a gene transcription activator. Another 6 up-regulated genes had unexpectedly high p-values, suggesting that possibly HNF1α sites have been suppressed from these genes. In conclusion, gene expression data from transgenic animals lacking a transcription factor can be used to identify DNA binding sites for that factor.
Abstract: Bacterial infections trigger a wide range of host cell responses. For the interaction of Pseudomonas aeruginosa and epithelial cells it is known that transcription factor NF-κB plays a central role, but its effects have to be specified by cooperation with additional factors. NF-κB containing composite elements, e.g. with C/EBP, may be appropriate indicators for new antibacterial response genes. We refined matrix-based search methods for C/EBP, which was necessary because of weak consensi…of the previosly existing C/EBP matrices, established a model for C/EBP / NF-κB composite element, used it for scanning all known human 5'-flanking sequences and identified 135 new candidate genes. The newly constructed C/EBP binding patterns will be available with one of the next releases of the TRANSFAC database (http://www.gene-regulation.de).
Abstract: A new approach to recognizing promoter regions of eukaryotic genes is proposed and illustrated by an example of Drosophila melanogaster. The essence of its novelty is in realizing the genetic algorithm to search for optimal partition of a promoter region into local nonoverlapping fragments and selection of the most significant dinucleotide frequencies for the fragments obtained. The method developed was applied to recognizing TATA-containing (TATA+) and DPE-containing (DPE+) promoters of Drosophila melanogaster genes.…The program for promoter recognition is included into the GeneExpress system, section RegScan (http://wwwmgs.bionet.nsc.ru/mgs/programs/proga/).
Abstract: We quantify fluctuations in protein expression for three of the segmentation genes in the fruit fly, Drosophila melanogaster. These proteins are representative members of the first three levels of a signalling hierarchy which determines the segmented body plan: maternal (Bicoid protein); gap (Hunchback protein); and pair-rule (Even-skipped protein). We quantify both inter-embryo and inter-nucleus (within a single embryo) variability in expression, especially with respect to positional specification by concentration gradient reading. Errors…are quantified both early and late in cleavage cycle 14, during which the protein patterns develop, to study the dynamics of error transmission. We find that Bicoid displays very large positional errors, while expression of the downstream genes, Hunchback and Even-skipped, displays far more precise positioning. This is evidence that the pattern formation of the downstream proteins is at least partially independent of maternal signal, i.e. evidence against simple concentration gradient reading. We also find that fractional errors in concentration increase during cleavage cycle 14.
Abstract: Using the method of generalized threshold models, the problem is formulated and solved to evaluate the parametric stability of the model of a gene subnetwork controlling the early ontogenesis of the fruit fly Droso-phila melanogaster. Computer experiments have been performed to test the parametric stability of the model. Quan-titative evaluations have been obtained for parametric stability of the Drosophila gene subnetwork in nuclei along the embryo's anterior-posterior axis. The results of computer experiments have been…compared with the previous research data on "sensitivity" of functioning regimes to random changes of the parameters in the models of prokary-otic and eukaryotic systems, namely the system controlling the λ-phage development and the subsystem controlling the flower morphogenesis of Arabidopsis thaliana. The obtained results confirm high parametric stability of gene networks that control the development of organisms.
Abstract: Light regulates almost all physiological and biochemical processes in plants. Plants react to quantitative and qualitative light characteristics owing to the system of photoreceptors and a branched network for light signal transduction. Comprehensible visual representations of gene networks using of filter technology in the GeneNet system assist understanding the types of the relationships between the components inside the networks and to define their hierarchical structure.
Keywords: gene networks, signal transduction, photomorphogenesis, photoreception
Abstract: Recognition sites for type II restriction and modification enzymes in genomes of several bacteria are recognized as semi-palindromic motifs and are avoided at a significant degree. The key idea of contrast word analysis with respect to RMS recognition sites, is that under-represented words are likely to be selected against. Starting from over- or underrepresented words corresponding to RMS recognition sites in specific clades, the specificity of unknown R-M systems can be highlighted. Among the known restriction…enzymes, that are described in the REBASE database of restriction and modification systems, many of their recognition sites are still uncharacterized. Eventually, this motivates studies aimed at assessing horizontal transferring events of RMS in micro-organisms through the analysis of word usage biases in well-determined genomic regions. A probabilistic model is built on a first-order Markovian chain. Statistics on the k-neighborhood of a word is carried out to assess the biological significance of a genomic motif. Efficient word counting procedures have been implemented and statistics are used for the assessment of the significance of individual words in large sequences. On the basis of the set of most avoided words, and in accordance to the IUPAC coding standards, suggestions are made regarding potential recognition sequences. In certain cases, a comparison of avoided palindromic words in taxonomically related bacteria shows a pattern of relatedness of their R-M systems. For strengthening this analysis, the primary protein structure of all type II R-M systems known in REBASE have been blasted against the nr-GENBANK database. The combination of these analyses has revealed some interesting examples of possible horizontal transfer events of R-M systems.
Abstract: Known transcription regulatory signals which generally act as transcription factor binding sites (TFs) differ significantly in their base composition. Therefore, their occurrence in a genome largely depends on the local base composition. In an attempt to initiate an all human genome analysis for the occurrence of potential TFs, we systematically analyzed the GC-content of distinct functional regions (e.g., upstream and downstream gene regions, exons, long and short introns, repetitive elements) and correlated the…frequencies of potential binding sites of a representative set of TFs in these regions. For these analyses, we used the pattern collection of the TRANSFAC® database on transcriptional regulation, the information about functionally relevant combinations of them from the database TRANSCompel®, and our new resource, TRANSGenome™, which provides an overall annotation of the human genome with emphasis on its regulatory characteristics. We show that the occurrence of sequence patterns with regulatory potential may be supported by, but cannot be fully explained by either the GC content of a whole chromosome or its putative promoter regions, nor by the information content of the patterns. Several patterns, HNF-3, NFAT, and GC box, show a clear overrepresentation in all promoter groups as well as in all chromosomes. Other patterns, like E2F and CRE-BP1, are underrepresented in all promoter groups as well as in all chromosomes in comparison with random sequences. Simultaneously, both patterns are over-represented in promoters in comparison with repetitive elements. We define several structural characteristics of the proximal promoters that differentiate them from other functional genomic regions. Two well-known promoter elements, GC- and TATA-boxes, are statistically enriched in promoters in comparison with random sequences, repetitive elements and exons. Altogether, our findings provide insights into the macroheterogeneity amongst the individual chromosomes, into the microheterogeneity among different functional regions of individual chromosomes, contribute to further understanding of structural organization of gene regulatory regions, and give first hints on the development of regulatory features during evolution.
Keywords: human genome, transcription factor binding sites, computational analysis, gene regulation, promoters, repetitive elements
Abstract: Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters…of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
Abstract: The rate constant of an enzyme-catalysed reaction is one of the major target properties to understand protein function. Atomic-detail computer simulations can in principle be used to estimate rate constants from the energy profile along the reaction coordinate. For such simulations, molecular mechanics is combined with a quantum description of the reaction process. In molecular mechanics calculations, the electrostatic field is represented by the Coulomb potential of partial atomic charges which have been parametrised for…small building blocks in vacuum and transferred to the macromolecule. In aqueous solution, however, the electrostatic interactions are affected by the solvent polarization. While this can be described by numerically solving the Poisson–Boltzmann equation, it is computationally expensive. A simple approximation to this is to optimally reproduce the electrostatic potential in solution by reparametrising the partial atomic charges in such a way that a simple Coulomb potential can still be used. Such a procedure would allow to perform fast calculations of reaction processes in proteins while accounting for the solvent screening effect. Here, this method is tested on myosin, a motor protein that is both an enzyme and exists in very different conformations.
Abstract: Transmembrane transport is an essential component of the cell life. Many genes encoding known or putative transport proteins are found in bacterial genomes. In most cases their substrate specificity is not experimentally determined and only approximately predicted by comparative genomic analysis. Even less is known about the 3D structure of transporters. Nevertheless, the published experimental data demonstrate that channel-forming residues determine the substrate specificity of secondary transporters and analysis of these residues…would provide better understanding of the transport mechanism. We developed a simple computational method for identification of channel-forming residues in transporter sequences. It is based on the analysis of amino acids frequencies in bacterial secondary transporters. We applied this method to a variety of transmembrane proteins with resolved 3D structure. The predictions are in sufficiently good agreement with the real protein structure.
Abstract: We have developed PROF_PAT, a database of patterns, constructed for groups of related proteins and designed to maximize representation of amino acid sequences from the SWISS-PROT database. The purpose of the current study was to demonstrate that PROT_PAT is not only as good as known analogs but surpasses them in some features. 10938 new amino acid sequences from the SWISS-PROT bank were compared with patterns constructed for protein families in the PROF_PAT 1.10 bank.…The aim of the comparisons was to estimate some threshold values of "Score" parameter to distinguish random similarities from significant ones. From the 10938 new sequences, 638 did not reveal any similarities with PROF_PAT patterns. Cases of found similarities were divided into three sets: 'positive', 'putative' (or 'unknown', and 'false positive', containing 7719, 2297 and 284 sequences respectively. Using 20 amino acid sequences from the TrEMBL bank that have no descriptions, PROF_PAT demonstrated specificity at a level that was as good as for the best-known "secondary" banks. At the same time, its pattern content and variety of included proteins was significantly richer, and its search speed was 3–10 times higher than those of any other protein family bank used for comparison.
Keywords: protein families, patterns, motifs, similarity search, data banks, amino acid sequences, protein comparison
Abstract: The long-term objective of our work is the computational construction of complex cellular systems. Therefore, input data must be prepared systematically first. In the field of biological data information is spread out over hundreds of molecular databases. Different approaches of data integration have been already applied in the past to provide a homogeneous access to these databases. Actually, a system is required to store integrated data of different type locally and to search for networks found…in the integrated dataset. Finally, the analysis of these networks can answer different questions, e.g. the role of biochemical substances in regulatory systems or the reasoning of metabolic diseases. In this article a novel object-oriented modelling approach in the field of biochemical networks is presented. Molecular objects are modelled conceptually using object classes, internally based on the object model OMG IDL. Specific object services are implemented automatically for each model and object instances are built using data integration. In combination with that, a specific view concept based on access paths has been implemented to model biochemical reaction rules automatically. Together with the application of graphical methods, pathways and cliques are computed by the system. They provide the topology of cellular process networks. The system has been implemented using Java and CORBA.
Keywords: object-oriented modelling, data integration, network analysis, Java, CORBA
Abstract: The IGMS is a comprehensive information system that combines the knowledge from genomic sequence, genetic map and genetic disorders databases. This system is updated weekly and focuses on the analysis of EST data. The IGMS identifies UniGene clusters that are differentially expressed in different types of cancer with respect to different reference tissues. The results can be combined with clinical data to asses the potential relevance of specific genes for patient survival or metastatic spread. The…second application maps EST with a specific expression profile. Our third application generates a database of alternative splice forms for nine organisms from EST and mRNA sequence data. The results can be used to find splicing patterns specific for certain tissues or tumour types. Availability: http://www.bioinf.mdc-berlin.de/igms/.
Keywords: alternative splicing, ESTs, database, gene expression profiles, colon cancer