Purchase individual online access for 1 year to this journal.
Price: EUR N/A
ISSN 1386-6338 (P)
ISSN 1434-3207 (E)
In Silico Biology is a scientific research journal for the advancement of computational models and simulations applied to complex biological phenomena. We publish peer-reviewed leading-edge biological, biomedical and biotechnological research in which computer-based (i.e.,
) modeling and analysis tools are developed and utilized to predict and elucidate dynamics of biological systems, their design and control, and their evolution. Experimental support may also be provided to support the computational analyses.
In Silico Biology aims to advance the knowledge of the principles of organization of living systems. We strive to provide computational frameworks for understanding how observable biological properties arise from complex systems. In particular, we seek for integrative formalisms to decipher cross-talks underlying systems level properties, ultimate aim of multi-scale models.
Studies published in
In Silico Biology generally use theoretical models and computational analysis to gain quantitative insights into regulatory processes and networks, cell physiology and morphology, tissue dynamics and organ systems. Special areas of interest include signal transduction and information processing, gene expression and gene regulatory networks, metabolism, proliferation, differentiation and morphogenesis, among others, and the use of multi-scale modeling to connect molecular and cellular systems to the level of organisms and populations.
In Silico Biology also publishes foundational research in which novel algorithms are developed to facilitate modeling and simulations. Such research must demonstrate application to a concrete biological problem.
In Silico Biology frequently publishes special issues on seminal topics and trends. Special issues are handled by Special Issue Editors appointed by the Editor-in-Chief. Proposals for special issues should be sent to the Editor-in-Chief.
About In Silico Biology
is a pendant to
(in the living system) and
(in the test tube) biological experiments, and implies the gain of insights by computer-based simulations and model analyses.
In Silico Biology (ISB) was founded in 1998 as a purely online journal. IOS Press became the publisher of the printed journal shortly after. Today, ISB is dedicated exclusively to biological systems modeling and multi-scale simulations and is published solely by IOS Press. The previous online publisher, Bioinformation Systems, maintains a website containing studies published between 1998 and 2010 for archival purposes.
We strongly support open communications and encourage researchers to share results and preliminary data with the community. Therefore, results and preliminary data made public through conference presentations, conference proceeding or posting of unrefereed manuscripts on preprint servers will not prohibit publication in ISB. However, authors are required to modify a preprint to include the journal reference (including DOI), and a link to the published article on the ISB website upon publication.
Abstract: About five years ago, ontology was almost unknown in bioinformatics, even more so in molecular biology. Nowadays, many bioinformatics articles mention it in connection with text mining, data integration or as a metaphysical cure for problems in standardisation of nomenclature and other applications. This article attempts to give an account of what concept ontologies in the domain of biology and bioinformatics are; what they are not; how they can be constructed; how they can be…used; and some fallacies and pitfalls creators and users should be aware of.
Abstract: Comparative sequence analysis is a powerful approach to identify functional elements in genomic sequences. Herein, we describe AGenDA (Alignment-based GENe Detection Algorithm), a novel method for gene prediction that is based on long-range alignment of syntenic regions in eukaryotic genome sequences. Local sequence homologies identified by the DIALIGN program are searched for conserved splice signals to define potential protein-coding exons; these candidate exons are then used to assemble complete gene structures. The…performance of our method was tested on a set of 105 human-mouse sequence pairs. These test runs showed that sensitivity and specificity of AGenDA are comparable with the best gene- prediction program that is currently available. However, since our method is based on a completely different type of input information, it can detect genes that are not detectable by standard methods and vice versa. Thus, our approach seems to be a useful addition to existing gene-prediction programs. Availability: DIALIGN is available through the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak.uni-bielefeld.de/dialign/ The gene-prediction program AGenDA described in this paper will be available through the BiBiServ or MIPS web server at http://mips.gsf.de.
Abstract: MOTIVATION: Most of diseases are caused by a set of gene defects, which occur in a complex association. The association scheme of expressed genes can be modelled by genetic networks. Genetic networks are efficiently facilities to understand the dynamic of pathogenic processes by modelling molecular reality of cell conditions. In this sense a genetic network consists of first, a set of genes of specified cells, tissues or species and second, causal relations between these genes determining…the functional condition of the biological system, i. e. under disease. A relation between two genes will exist if they both are directly or indirectly associated with disease . Our goal is to characterize diseases (especially autoimmune diseases like chronic pancreatitis CP, multiple sclerosis MS, rheumatoid arthritis RA) by genetic networks generated by a computer system. We want to introduce this practice as a bioinformatic approach for finding targets.
Abstract: GOBASE is a relational database that integrates data associated with mitochondria and chloro-plasts. The most important data in GOBASE, i. e., molecular sequences and taxonomic information, are obtained from the public sequence data repository at the National Center for Biotechnology Information (NCBI), and are validated by our experts. Maintaining a curated genomic database comes with a towering labor cost, due to the shear volume of available genomic sequences and the plethora of annotation errors and omissions…in records re-trieved from public repositories. Here we describe our approach to increase automation of the database population process, thereby reducing manual intervention. As a first step, we used Unified Modeling Language (UML) to construct a list of potential errors. Each case was evaluated independently, and an expert solution was devised, and represented as a diagram. Subsequently, the UML diagrams were used as templates for writing object-oriented automation programs in the Java programming language.
Abstract: A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter…all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" . SEMEDA can be found at http://www-bm.cs.uni-magdeburg. de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.
Abstract: A method has been developed for constructing a tree source model for genetic text generation. Model visualisation in the form of suffix (context) trees provides a new way of context analysis of symbol sequences. Estimation of the stochastic complexity of the data in the frame of the model serves as a criterion for the model's ascertainment. The model and complexity values are used for analysis of genetic texts. The software realisation of this algorithm enables to…reveal statistical properties of genetic sequences based on an information measure. The program developed is available via Internet at http://wwwmgs.bionet.nsc.ru/mgs/programs/complexity/.
Keywords: complexity, information measure, suffix tree visualisation, variable memory Markov model, genetic texts, statistical modelling
Abstract: The combination of full-scale genomic sequencing with high throughput expression analysis provides a new and largely unexploited basis for in silico functional genomics. Recent break through developments in locat-ing and analyzing promoters now allow extending functional genomics in silico far beyond identification of protein sequences into the complex regulatory structures and mechanisms of the genome. However, only first examples of this new type of approach are emerging at present and intensive further developments…of bioinformatics tools will be required before such analysis can become large-scale routine in genomic sequence analysis. Nevertheless, the door to a new dimension of functional analysis of the genomic sequence is open. Finally, only the tight integration of the enormous amount of knowledge gained from proteins sequence analysis with the complementary information about gene regulation will afford us with a more complete picture of the networks than constitute life.
Abstract: This paper presents implementation of Data Mining and Knowledge Discovery techniques for search-ing for regularities in tables of context features of DNA sequences involved in regulation of transcription. The goal is to discover regularities that relate nucleotide sequences to the functional classes of these sequences. The search patterns for regularities have been constructed in the first-order logic augmented by probabilistic estimates. To this aim, the PC software system Gene Discovery has been…designed. This system accepts molecular-genetical data retrieved from a database by using SQL queries. Nucleotide sequences of promoters of several functional systems were extracted from the TRRD database (http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/) and analysed. The data in-clude nucleotide sequences of erythroid-specific gene promoters, endocrine system gene promoters, promoter regions of the genes controlling cell cycle, promoter of genes regulating lipid metabolism, and muscle-specific gene promoters. Several regularities that relate the nucleotide sequences in the regulatory DNA and their location relative to the transcription start with each functional class have been found.
Abstract: The availability of genome-wide gene expression data provides a unique set of genes from which we can decipher the mechanisms underlying the common transcriptional response. Transcription factors, which can bind to specific DNA sites, cooperatively regulate the transcription of genes. This study attempts to mine putative binding sites to investigate how combinations of the sites predicted from known sites and over-represented repetitive elements are distributed in the promoter regions of groups of functionally…related genes. The over-represented repetitive elements appearing in the associations are possible transcription factor binding sites. The deduced association rules would facilitate to predict putative regulatory elements and to identify genes which are potentially co-regulated by the putative regulatory elements. Our proposed approach is applied to Saccharomyces cerevisiae and the promoter regions of yeast ORFs.
Keywords: regulatory elements, repetitive oligonucleotide, data mining, promoter
Abstract: Selenocysteine is the 21th amino acid, which occurs in all kingdoms of life. Selenocysteine is en-coded by the STOP-codon UGA. For its insertion, it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin like structure (called Sec insertion sequence (SECIS). We consider the computational problem of generating new amino acid sequences containing selenocysteine. This requires to find an mRNA se-quence that is similar to the SECIS-consensus, is able to form the secondary structure…required for selenocysteine insertion, and whose translation is maximally similar to the original amino acid sequence. We show that the problem can be solved in linear time when considering the hairpin-like SECIS-structure (and, more generally, when consider-ing a structure that does not contain pseudoknots).
Keywords: selenocysteine, SECIS, protein engineering
Abstract: The present paper overviews the issue on predicting the subcellular location of a protein. Five meas-ures of extracting information from the global sequence based on the Bayes discriminant algorithm are reviewed. 1) The auto-correlation functions of amino acid indices along the sequence; 2) The quasi-sequence-order approach; 3) the pseudo-amino acid composition; 4) the unified attribute vector in Hilbert space, 5) Zp parameters extracted from the Zp curve. The actual performance of the predictive accuracy is closely…related to the degree of similarity be-tween the training and testing sets or to the average degree of pairwise similarity in dataset in a cross-validated study. Many scholars considered that the current higher predictive accuracy still cannot ensure that some available algorithms are effective in practice prediction for the higher pairwise sequence identity of the datasets, but some of them declared that construction of the dataset used for developing software should base on the reality determined by the Mother Nature that some subcellular locations really contain only a minor number of proteins of which some even have a high percentage of sequence similarity. Owing to the complexity of the problem itself, some very so-phisticated and special programs are needed for both constructing dataset and improving the prediction. Anyhow finding the target information in mature protein sequence and properly cooperating it with sorting signals in predic-tion may further improve the overall predictive accuracy and make the prediction into practice.
Abstract: ABSTRACT: Signal transduction events are often mediated by small protein domains such as SH2 (Src homology 2) domains that recognize phosphotyrosines (pY) and flanking sequences. In case of the SHP-2 receptor tyrosine phosphatase an N-terminal SH2 domain binds and inactivates the phosphatase (PTP) domain. The pY-peptide- binding site on the N-terminal SH2 domain does not overlap with the PTP binding region. Nevertheless, pY-peptide binding causes domain dissociation and phosphatase activation. Comparative multi-nanosecond…molecular dynam-ics simulations on the N-SH2 domain in ligand-bound and free states have been performed to study the allosteric mechanism that leads to domain dissociation upon pY-peptide binding. Significant ligand-dependent differences in the conformational flexibility of regions that are involved in SH2-PTP domain association have been observed. The results support a mechanism of signal transduction where SH2-peptide binding modulates the domain flexibility and reduces its capacity to fit into the entrance of the PTP catalytic domain of SHP-2.
Abstract: We propose a specification language ProML for protein sequences, structures, and families based on the open XML standard. The language allows for portable, system-independent, machine-parsable and human-readable representation of essential features of proteins. The language is of immediate use for several bioinformatics applications: we discuss clustering of proteins into families and the representation of the specific shared features of the respective clusters. Moreover, we use ProML for specification of data used in…fold recognition bench-marks exploiting experimentally derived distance constraints.
Keywords: Protein Markup Language, ProML, XML, protein properties, protein families, protein structures, distance constraints, protein clusters
Abstract: We present a comprehensive analysis of methods for improving the fold recognition rate of the threading approach to protein structure prediction by the utilization of few additional distance constraints. The distance constraints between protein residues may be obtained by experiments such as mass spectrometry or NMR spectroscopy. We applied a post-filtering step with new scoring functions incorporating measures of constraint satisfaction to ranking lists of 123D threading alignments. The detailed analysis of the…results on a small representative benchmark set show that the fold recognition rate can be improved significantly by up to 30% from about 54%-65% to 77%-84%, approaching the maximal attainable performance of 90% estimated by structural superposition alignments. This gain in performance adds about 10% to the recognition rate already achieved in our previous study with cross-link constraints only. Additional recent results on a larger benchmark set involving a confidence function for threading predictions also indicate notable improvements by our combined approach, which should be particularly valuable for rapid structure determination and validation of protein models.
Keywords: protein threading, fold recognition, structure prediction, experimental data, distance constraints, cross-linking reagents, mass spectrometry, NOE restraints, NMR
Abstract: Classification of proteins is a major challenge in bioinformatics. Here an approach is presented, that unifies different existing classifications of protein structures and sequences. Protein structural domains are repre-sented as nodes in a hypergraph. Shared memberships in sequence families result in hyperedges in the graph. The presented method partitions the hypergraph into clusters of structural domains. Each computed cluster is based on a set of shared sequence family memberships. Thus, the clusters put existing…protein sequence families into the context of structural family hierarchies. Conversely, structural domains are related to their sequence family member-ships, which can be used to gain further knowledge about the respective structural families.
Keywords: sequence analysis, structure analysis, domain boundary delineation, protein databases, protein homology, protein structure prediction, threading, template selection, optimization, protein clustering
Abstract: Protein data in the PDB covers only a snapshot of a protein structure. For flexible docking confor-mational changes need to be considered. Rotamer statistics provide the likelihood for side chain conformations, and further comparison of bound and unbound state yields differences in preferred positions. Furthermore, we do a full sampling of selected angles and apply the AMBER force field. Conformation of energy minima complies with the rotamer statistics. Both types of information target the reduction…of search space for enumerative docking algo-rithms and provide parameters for elastic docking.
Keywords: Rotamer library, flexible protein-protein docking, energy calculations, AMBER force field, side chain flexibility, flexibility measure
Abstract: We have developed a complete statistical model for the analysis of tumor specific gene expression profiles. The approach provides investigators with a global overview on large scale gene expression data, indicating aspects of the data that relate to tumor phenotype, but also summarizing the uncertainties inherent in classification of tumor types. We demonstrate the use of this method in the context of a gene expression profiling study of 27 human breast cancers. The study is aimed…at defining molecular characteristics of tumors that reflect estrogen receptor status. In addition to good predictive performance with respect to pure classification of the expression profiles, the model also uncovers conflicts in the data with respect to the classification of some of the tumors, highlighting them as critical cases for which additional investigations are appropriate.
Abstract: To assess the relevance of molecular markers it is required to combine clinical and genetic information. For reliable assessment of parameters relevant to diagnostics and therapy large patient collectives must be characterized both with respect to phenotype and genotype. Matching of genetic data like gene expression profiles, molecular genetics and cytogenetics with clinical data like follow-up, morphological findings and diagnoses involves integration of complex databases. In the context of a nationwide leukemia…research network in Germany we designed an integrated database covering both genetic and clinical data of patients. The system contains follow-up data and relevant laboratory modalities, i. e. cytomorphology, cytogenetics, molecular genetics, FISH, immunophenotyping and gene expression profiling. So far 13541 cases from 7746 patients treated by 1225 physicians are documented. The data structure consists of up to 888 variables per case. From our experience, integration of clinical and genetic information requires significant efforts - including data protection issues -, but is feasible and improves data quality leading to faster and more reliable research results for the benefit of the patients.
Abstract: Pattern formation in multicellular spheroids is addressed with a hybrid lattice-gas cellular automaton model. Multicellular spheroids serve as experimental model system for the study of avascular tumor growth. Typically, multicellular spheroids consist of a necrotic core surrounded by rings of quiescent and proliferating tumor cells, respectively. Furthermore, after an initial exponential growth phase further spheroid growth is significantly slowed down even if further nutrient is supplied. The cellular automaton model explicitly takes…into account mitosis, apoptosis and necrosis as well as nutrient consumption and a diffusible signal that is emitted by cells becoming necrotic. All cells follow identical interaction rules. The necrotic signal induces a chemotactic migration of tumor cells towards maximal signal concentrations. Starting from a small number of tumor cells automaton simulations exhibit the self-organized formation of a layered structure consisting of a necrotic core, a ring of quiescent tumor cells and a thin outer ring of proliferating tumor cells.
Abstract: To gain further knowledge about rare genetic diseases, a world wide method for data collection via the Internet has been established. This new approach will improve collecting valuable data from single case reports. Ramedis saves standardised patient data which will be usable for statistics, longitudinal examinations and cooperative studies in future time. Embedded in the scene of the German Human Genome Project, Ramedis directly will enable phenotype-genotype correlations. Beside the better characterisation of clinical…heterogeneity of rare metabolic diseases, there may be a great benefit for the treatment of these patients in whom prospective studies are otherwise expensive and difficult to perform. This contribution presents the motivation for this system, introduces features, current state and the future of the project. Additionally, first experiences of using Ramedis by health professionals are explained.
Keywords: case study, database, genotype-phenotype correlation, information system, rare metabolic disease, remote data entry
Abstract: BioPath is a prototype system for the interactive exploration of biochemical pathways. It has been developed as an electronic version of the famous Boehringer Biochemical Pathways map and offers various ways to access information on substances and pathways and to navigate through pathways. This paper describes the main features and the software architecture of BioPath. The companion paper  focuses on the advanced visualization incorporated into BioPath.
Abstract: Glycosylated proteins are ubiquitous components of extracellular matrices and cellular surfaces where their oligosaccharide moieties are implicated in a wide range of cellcell and cellmatrix recognition events. Glycans constitute highly flexible molecules. Only a small number of glycan X-ray structures is available for which sufficient electron density for an entire oligosaccharide chain has been observed. An unambiguous structure deter-mination based on NMR-derived geometric constraints alone is often not possible. Time consuming computational…approaches such as Monte Carlo calculations and molecular dynamics simulations have been widely used to explore the conformational space accessible to complex carbohydrates. The generation of a comprehensive data base for N-glycan fragments based on long time molecular dynamics simulations is presented. The fragments are chosen in such a way that the effects of branched N-glycan structures are taken into account. The prediction database consti-tutes the basis of a procedure to generate a complete set of all possible conformations for a given N-glycan. The constructed conformations are ranked according to their energy content. The resulting conformations are in reason-able agreement with experimental data. A web interface has been established (http://www.dkfz.de/spec/glydict/), which enables to input any N-glycan of interest and to receive an ensemble of generated conformations within a few minutes.
Keywords: conformations of N-glycans, molecular dynamics simulations, database of N-glycan fragments, glycoproteins