In Silico Biology - Volume 5, issue 1 - Journals

Third "Ontology Workshop on Ontology and Genomes"

Authors: Takai-Igarashi, Takako | Takagi, Toshihisa | Michael, Holger | Wingender, Edgar

Article Type: Other

Citation: In Silico Biology, vol. 5, no. 1, pp. 1-3, 2005

Get PDF

Who Tangos with GOA? – Use of Gene Ontology Annotation (GOA) for Biological Interpretation of '-omics' Data and for Validation of Automatic Annotation Tools

Authors: Lee, Vivian | Camon, Evelyn | Dimmer, Emily | Barrell, Daniel | Apweiler, Rolf

Article Type: Research Article

Abstract: The number of large-scale experimental datasets generated from high-throughput technologies has grown rapidly. Biological knowledge resources such as the Gene Ontology Annotation (GOA) database, which provides high-quality functional annotation to proteins within the UniProt Knowledgebase, can play an important role in the analysis of such data. The integration of GOA with analytical tools has proved to aid the clustering, annotation and biological interpretation of such large expression datasets. GOA is also useful …in the development and validation of automated annotation tools, in particular text-mining systems. The increasing interest in GOA highlights the great potential of this freely available resource to assist both the biological research and bioinformatics communities. Show more

Keywords: gene ontology, annotation, data analysis

Citation: In Silico Biology, vol. 5, no. 1, pp. 5-8, 2005

Price: EUR 27.50

PRIME: Automatically Extracted PRotein Interactions and Molecular Information DatabasE

Authors: Koike, Asako | Takagi, Toshihisa

Article Type: Research Article

Abstract: With the exponentially increasing amount of information in the biomedical field, the significance of advanced information retrieval and information extraction, as well as the role of databases, has been increasing. PRIME is an integrated gene/protein informatics database based on natural language processing. It provides automatically extracted protein/family/gene/compound interaction information including both physical and genetic interactions, gene ontology based functions, and graphic pathway viewers. Gene/protein/family names and functional terms are recognized based on …dictionaries developed in our laboratory. The interaction and functional information are extracted by syntactic dependencies and various phrase patterns. We have included about 920,000 (non-redundant) protein interactions and 360,000 annotated gene-function relationships for major eukaryotes. By combining the sequence and text information, the pathway comparison between two organisms and simple pathway deduction based on other organism interaction data, and pathway filtering using tissue expression data, are also available. This database is accessible at http://prime.ontology.ims.u-tokyo.ac.jp:8081. Show more

Keywords: protein interaction, biological process, pathway database, natural language processing

Citation: In Silico Biology, vol. 5, no. 1, pp. 9-20, 2005

Price: EUR 27.50

Large-Scale Extraction of Gene Regulation for Model Organisms in an Ontological Context

Authors: Saric, Jasmin | Jensen, Lars J. | Rojas, Isabel

Article Type: Research Article

Abstract: This paper presents an approach using syntactosemantic rules for the extraction of relational information from biomedical abstracts. The results show that by overcoming the hurdle of technical terminology, high precision results can be achieved. From abstracts related to baker's yeast, we manage to extract a regulatory network comprised of 441 pairwise relations from 58,664 abstracts with an accuracy of 83–90%. To achieve this, we made use of a resource of gene/protein names considerably larger than those …used in most other biology related information extraction approaches. This list of names was included in the lexicon of our retrained partof- speech tagger for use on molecular biology abstracts. For the domain in question an accuracy of 93.6–97.7% was attained on Part-of-speech-tags. The method can be easily adapted to other organisms than yeast, allowing us to extract many more biologically relevant relations. The main reason for the comparable precision rates is the ontological model that was built beforehand and served as a guiding force for the manual coding of the syntactosemantic rules. Preliminary results on journal articles from PubMed Central suggest that our rule set performs with equal accuracy when applied to full text rather than abstracts. Show more

Keywords: ontologies, information extraction, bionlp, natural language processing

Citation: In Silico Biology, vol. 5, no. 1, pp. 21-32, 2005

Price: EUR 27.50

Linking Experimental Results, Biological Networks and Sequence Analysis Methods Using Ontologies and Generalised Data Structures

Article Type: Research Article

Abstract: The structure of a closely integrated data warehouse is described that is designed to link different types and varying numbers of biological networks, sequence analysis methods and experimental results such as those coming from microarrays. The data schema is inspired by a combination of graph based methods and generalised data structures and makes use of ontologies and meta-data. The core idea is to consider and store biological networks as graphs, and to use generalised data structures …(GDS) for the storage of further relevant information. This is possible because many biological networks can be stored as graphs: protein interactions, signal transduction networks, metabolic pathways, gene regulatory networks etc. Nodes in biological graphs represent entities such as promoters, proteins, genes and transcripts whereas the edges of such graphs specify how the nodes are related. The semantics of the nodes and edges are defined using ontologies of node and relation types. Besides generic attributes that most biological entities possess (name, attribute description), further information is stored using generalised data structures. By directly linking to underlying sequences (exons, introns, promoters, amino acid sequences) in a systematic way, close interoperability to sequence analysis methods can be achieved. This approach allows us to store, query and update a wide variety of biological information in a way that is semantically compact without requiring changes at the database schema level when new kinds of biological information is added. We describe how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems. The system is developed under the GPL license and can be downloaded from http://sourceforge.net/projects/ondex/ Show more

Keywords: graph database, ontology, Generalised Data Structures, semantic data integration

Citation: In Silico Biology, vol. 5, no. 1, pp. 33-44, 2005

Price: EUR 27.50

IMGT-Choreography for Immunogenetics and Immunoinformatics

Article Type: Research Article

Abstract: IMGT, the international ImMunoGeneTics information system® (http://imgt.cines.fr), was created in 1989 at Montpellier, France. IMGT is a high quality integrated knowledge resource specialized in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune system (RPI) which belong to the immunoglobulin superfamily (IgSF) and MHC superfamily (MhcSF). IMGT provides a common access to standardized data from genome, proteome, genetics and three-dimensional …structures. The accuracy and the consistency of IMGT data are based on IMGT-ONTOLOGY, a semantic specification of terms to be used in immunogenetics and immunoinformatics. IMGT-ONTOLOGY has been formalized using XML Schema (IMGT-ML) for interoperability with other information systems. We are developing Web services to automatically query IMGT databases and tools. This is the first step towards IMGT-Choreography which will trigger and coordinate dynamic interactions between IMGT Web services to process complex significant biological and clinical requests. IMGT-Choreography will further increase the IMGT leadership in immunogenetics and immunoinformatics for medical research (repertoire analysis of the IG antibody recognition sites and of the TR recognition sites in autoimmune and infectious diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research (IG and TR repertoires in farm and wild life species), genome diversity and genome evolution studies of the adaptive immune responses, biotechnology related to antibody engineering (single chain Fragment variable (scFv), phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (detection and follow-up of residual diseases) and therapeutical approaches (grafts, immunotherapy, vaccinology). IMGT is freely available at http://imgt.cines.fr. Show more

Keywords: IMGT, ontology, database, information system, knowledge resource, immunoinformatics, immunogenetics, antibody, immunoglobulin, T cell receptor, superfamily, MHC, HLA, Collier de Perles, three-dimensional, 3D structure, polymorphism, choreography, Web service, annotation

Citation: In Silico Biology, vol. 5, no. 1, pp. 45-60, 2005

Price: EUR 27.50

Deriving an Ontology for Human Gene Expression Sources from the CYTOMER® Database on Human Organs and Cell Types

Article Type: Research Article

Abstract: CYTOMER® is a relational database of organs/tissues, cell types, physiological systems and developmental stages that currently focuses on the human system. From this database, we have derived an ontology for anatomical and morphological structures for the human organism which includes all embryonic stages and the cell types constituting these structures. The ontology has been transferred to the OWL format and is freely available for download at http://cytomer.bioinf.med.uni-goettingen.de.

Keywords: ontologies, human developmental stages, gene expression sources, relational database system, OWL, internet resource

Citation: In Silico Biology, vol. 5, no. 1, pp. 61-66, 2005

Price: EUR 27.50

Utilizing Weakly Controlled Vocabulary for Sentence Segmentation in Biomedical Literature

Authors: Satou, Kenji | Yamamoto, Kaoru

Article Type: Research Article

Abstract: Since biomedical texts contain a wide variety of domain specific terms, building a large dictionary to perform term matching is of great relevance. However, due to the existence of null boundary between adjacent terms, this matching is not a trivial problem. Moreover, it is known that generative words cannot be comprehensively included in a dictionary because their possible variations are infinite. In this study, we report our approach to dictionary building and term matching in …biomedical texts. Large amount of terms with/without part-of-speech (POS) and/or category information were gathered, and a completion program generated ∼1.36 million term variants to avoid stemming problems when matching terms. The dictionary was stored in a relational database management system (RDBMS) for quick lookup, and used by a matching program. Since the matching operation is not restricted to a substring surrounded by space characters, we can avoid the problem of null boundaries. This feature is also useful for generative words. Experimental results on GENIA corpus are promising: nearly half of the possible terms were correctly recognized as a meaningful segment, and most of the remaining half could be correctly recognized by some post-processing process, like chunking and further decomposition. It should be remarked that although we have not used term cost, connectivity cost, or syntactic information, reasonable segmentation and dictionary lookup were performed in most cases. Show more

Keywords: text processing, dictionary building, dictionary lookup and matching, sentence segmentation, term boundary

Citation: In Silico Biology, vol. 5, no. 1, pp. 67-79, 2005

Price: EUR 27.50

In Silico Biology - Volume 5, issue 1

Third "Ontology Workshop on Ontology and Genomes"

Who Tangos with GOA? – Use of Gene Ontology Annotation (GOA) for Biological Interpretation of '-omics' Data and for Validation of Automatic Annotation Tools

PRIME: Automatically Extracted PRotein Interactions and Molecular Information DatabasE

Large-Scale Extraction of Gene Regulation for Model Organisms in an Ontological Context

Linking Experimental Results, Biological Networks and Sequence Analysis Methods Using Ontologies and Generalised Data Structures

IMGT-Choreography for Immunogenetics and Immunoinformatics

Deriving an Ontology for Human Gene Expression Sources from the CYTOMER® Database on Human Organs and Cell Types

Utilizing Weakly Controlled Vocabulary for Sentence Segmentation in Biomedical Literature

North America

Europe

Asia