You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Knowledge-based biomedical Data Science

Abstract

Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled “Big Data to Knowledge (BD2K).” The main emphasis of the more than $200M allocated to that program has been on “Big Data;” the “Knowledge” component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science.

Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.

1.Representations of biomedical knowledge

All computational approaches to knowledge require specification of how the computer system represents knowledge internally, and how it might compute with those representations to produce outputs (often called, perhaps metaphorically, reasoning). Classic descriptions of knowledge representation and reasoning systems, e.g. [16] focus on what ontological commitments a knowledge representation makes, what inferences are possible with it, and, sometimes, which of those inferences can be made efficiently. These issues remain useful in thinking about how knowledge representation and reasoning play a role in today’s data science environment.

As [16] pointed out, knowledge representations entail ontological commitments. Adoption of existing ontologies, rather than creating idiosyncratic or single-use ontologies provides significant advantages for reproducibility in scientific research, for inter-operability, and in avoiding pitfalls in the modeling of knowledge. A great deal of work has been done in biomedical ontology (e.g. [2,36,39,41,45] and many others), and these increasingly mature ontological resources form an important basis for knowledge-based data science. Community-curated ontologies (such as those meeting the Open Biomedical Ontologies (OBO) Foundry criteria [42]) capture a consensus view of the entities and processes involved in biology, medicine and biomedical research, analogous to how nomenclature committees systematize naming conventions. While not meeting all of the criteria of the OBO Foundry, terminological resources such as UMLS [30], Snomed-CT [7] and the NCI thesaurus [19] have also been used to provide useful pseudo-ontological foundations for knowledge representations.

While ontologies identify the basic elements from which a knowledge representation is constructed, they are agnostic about the mechanisms by which ontological units are assembled into representations of knowledge. Building on decades of work in artificial intelligence research, the W3C produced a collection of international standards for assembling ontological entities into assertions and managing collections of assertions, together referred to as the Semantic Web. The focus of the Semantic Web standards is to make it possible to link web elements with shared meaning, and is sometimes described as the Linked Data paradigm. The Semantic Web builds on the standard Resource Description Framework (RDF), which provides a way to link three uniform resource identifiers (URIs) to specify a pair of entities and a relationship between them (forming an RDF “triple”). Collections of triples form a graph, and a computational mechanism for managing such collections is called a triple store. The Semantic Web standards also define RDF Schemas (RDFS) and a Web Ontology Language (OWL) which facilitate richer knowledge representations, SPARQL, which provides a query language for interrogating RDF graphs or triple stores, and the Simple Knowledge Organization System (SKOS), which provides a basic ontology, including simple semantic relationships. While the Semantic Web standards are intended to be general representation tools for all knowledge (e.g. RDF for facilitating exchange of research data), the combination of Semantic Web standards and biomedical ontologies are the basis of most current biomedical knowledge representation systems.

2.Knowledge-based inference

Representations of knowledge are sterile without use. Although human visualization of computationally represented knowledge (e.g. [32]) can be useful, the primary use of computationally represented knowledge is inference. There are many forms of inference, and thousands of publications describing computational methods of reasoning. Although too broad to survey here, a brief introduction to the types of knowledge-based inference common in biomedical applications gives some idea of its potential.

2.1.Logical inference

Computational logical inference is a mapping from a base set of assertions to create additional assertions that are entailed by the base. While deductive reasoning is the classic form of logical inference, it is, in general, computationally intractable. Various restricted forms of deductive inference, such as those based on description logics, have better computational performance, at the cost of greatly restricting the utility of the inferences. Description logics, for example, are limited to inferring subsumption relationships based on necessary and sufficient class definitions. Contemporary applications of description logic inference in biomedical knowledge representation have been successful primarily in checking for modeling errors (e.g. [8,26]), although some other applications have been attempted (e.g. [9,22,23]).

Deductive retrieval is a special case of deductive inference, where the inference is to compute whether a set of logical axioms and base assertions can be combined to satisfy a query; the programming language Prolog and the W3C standard for the Semantic Web Rule Language (SWRL) are examples of approaches to deductive retrieval. Triple stores extended with deductive retrieval are much more valuable than those that can retrieve only queries that match exactly. Several knowledge-bases of biomedicine based on these technologies have been developed (e.g. [3,6,31,48]), and their uses extend beyond deductive retrieval alone.

2.2.Inference from ontology annotation

In addition to the creation of biomedical ontologies, a great deal of effort has gone into annotating genes and other biological entities to ontological categories. Gene Ontology annotations of genes and gene products figure prominently in major databases such as UniProt and the Mouse Genome Informatics. These annotations provide a quick summary of knowledge about gene function, subcellular localization and biological processes. By far the most common application of computational representations of knowledge to problems in biomedicine is enrichment analysis, see e.g. [24,43,46]. Enrichment analysis generates hypotheses about the concerted functions of collections of genes by testing for annotations that occur more frequently in the collection than would be expected by chance. Ontology annotation directly supports other sorts of knowledge-based inference as well. For example, phenotype annotations play a major role in mapping between human disease and animal models (e.g. [28,34,35]). Formal representations of metabolic pathways (e.g. [18,27]) have been used to analyze metabolomic data and support metabolic engineering.

2.3.Inference from the biomedical literature

Despite the rapid growth of databases with ontological annotation, the main and by far the largest repository of biomedical knowledge remains the published literature. An important domain of knowledge-based data science involves natural language processing with the goal of producing computational representations of the knowledge in the literature. The most basic of these approaches involves tagging passages in the literature with ontological terms (e.g. EuroPMC’s SciLite annotations, or [20]). Computational methods to identify semantically well-defined entities in the literature support further analysis that identifies links both among different documents in the literature (e.g. [52]) and between entities in the literature and database entries about them (e.g. [37]). More ambitious literature mining goals involve producing more complex knowledge representations directly by processing natural language documents, e.g. [15,49], although significant improvements in performance are likely to be necessary before the results of such processing find widespread use in biomedical research. Text mining approaches applied to clinical records and social media, e.g. for pharmacovigilance applications, have also made significant strides recently [17]. The best performing text mining systems themselves often use representations of prior knowledge to drive understanding of text.

Natural language processing systems have also been used to support automated question answering. Perhaps the most well known of these efforts is IBM’s Watson system [12], which has found significant biomedical application. Many other computational systems for question answering, targeted to biomedical researchers and clinicians, have been fielded, e.g. as reviewed in [1,4]. Computational approaches to building systems that can answer biomedical exam questions have also been developed, e.g. [14].

2.4.Hypothesis generation, evaluation and modification

Perhaps the oldest method of computing with knowledge is Bayesian inference [21]. By providing a quantitative framework for the idea that observations consistent with prior knowledge are more likely than ones that contradict it, Bayesian reasoning has provided a basis for knowledge-based computation long before computation was automated. Contemporary computers provide the power necessary to support more elaborate Bayesian inference, including model selection as well as estimating model parameters [13].

Network-based inference, such as link prediction or community finding, have been successfully applied to generate significant biomedical hypotheses. Systems that compute over representations of knowledge of biomedicine have been used to propose as yet unobserved relationships among biological entities, e.g. for drugs [33], microRNAs [51], diseases [44] and proteins [47]; some of these predictions have been empirically validated, e.g. [25].

Perhaps the most exciting potential for knowledge-based computational systems is in the development and refinement of mechanistic explanations of biomedical phenomena. The vast scope and rapid evolution of the biomedical literature, combined with the breakdown of disciplinary boundaries driven by genome-scale research has made it increasingly difficult for researchers to effectively assimilate all the knowledge potentially relevant to interpreting the results of their own experiments. Although most computational approaches aim to provide material for the Results section of a paper, a few are beginning to target the Discussion section as well. While no knowledge-based computer system has repeatedly generated important biomedical hypotheses de novo, promising proof-of-concept systems include systems to generate hypotheses from the literature [40] and those aimed at hypothesis generation or refinement from data [11,38], as well as mixed initiative human-computer hypothesis generation [29]. Although it remains aspirational, the synthesis of computational simulation with knowledge-based generation and refinement of hypotheses has received substantial interest from funding agencies [50].

3.Open challenges in knowledge-based Data Science

As is clear from the NIH BD2K experience, computation over knowledge is a less widespread research focus than analysis of big data, and to date has had less impact in biomedicine. Certain applications, such as enrichment analysis and link prediction, have found widespread use in biomedical research. Text mining systems are increasingly deployed in areas such as helping clinicians keep up with rapidly changing clinical data [10] and pharmacovigilance. However, there are significant challenges to realizing the potential for knowledge-based data science. Perhaps the foremost among these is the knowledge acquisition bottleneck: human curation, even for the relatively simple task of annotation of genes to gene ontology terms is difficult to scale [5]. Alternatives to manual curation, including applications of text mining and machine learning, have shown promise, but are still far short of human-like performance. Another important understudied question is how to represent what is not known: any scientist can describe gaps, ambiguities and uncertainties in existing knowledge, yet there are few computational methods capable of representing, let alone reasoning about, such ignorance.

Even more challenging than developing representations of what is already known is the application of that knowledge to the pressing problems of biomedical research. Existing inference methods are far short of the range and creativity of human experts in developing potential explanations, generating significant hypotheses, and generally interpreting results in light of previous knowledge. Many promising inference methods scale poorly, and are constrained in their ability to harness large knowledge-bases by the extremely large computational loads involved. Even deductive retrieval systems can be computationally intractable over large knowledge-bases; more complex forms of inference hit the limits of current hardware with even smaller knowledge-bases. The Semantic Web standard was developed largely with description logic inference in mind; while it provides a solid foundation for knowledge representation systems, representational transformations may improve the efficiency of other sorts of inference.

Perhaps the biggest challenges in knowledge-based data science are in developing the vision for what such a system could effectively contribute to biomedical research. Is it possible to build computational systems that bring to bear disparate yet relevant facts from across all biomedical disciplines and scales, exploiting their ability to process far more information than any individual human being? Could such a system make sound judgements ranking alternative hypotheses based on an exhaustive comprehension of the literature? Is it possible for computational systems to generate significant and novel mechanistic and pathomechanistic hypotheses about open questions in biomedicine? It is positive answers to questions like these that will drive knowledge-based data science into the mainstream of biomedical research.

References

[1] 

S. Athenikos and H. Han, Biomedical question answering: A survey, Comput Methods Programs Biomed 99: ((2010) ), 1–24. doi:10.1016/j.cmpb.2009.10.003.

[2] 

A. Bandrowski, R. Brinkman, M. Brochhausen, M. Brush, B. Bug, M. Chibucos, K. Clancy, M. Courtot, D. Derom, M. Dumontier, L. Fan, J. Fostel, G. Fragoso, F. Gibson, A. Gonzalez-Beltran, M. Haendel, Y. He, M. Heiskanen, T. Hernandez-Boussard, M. Jensen, Y. Lin, A. Lister, P. Lord, J. Malone, E. Manduchi, M. McGee, N. Morrison, J. Overton, H. Parkinson, B. Peters, P. Rocca-Serra, A. Ruttenberg, S. Sansone, R. Scheuermann, D. Schober, B. Smith, L. Soldatova, C.J. Stoeckert, C. Taylor, C. Torniai, J. Turner, R. Vita, P. Whetzel and J. Zheng, The ontology for biomedical investigations, Plos One 11: ((2016) ), 0154556. doi:10.1371/journal.pone.0154556.

[3] 

M. Barros and F. Couto, Knowledge representation and management: A linked data perspective, Yearb Med Inform 10: ((2016) ), 178–183. doi:10.15265/IY-2016-022.

[4] 

M. Bauer and D. Berleant, Usability survey of biomedical question answering systems, Hum Genomics 6: ((2012) ), 17. doi:10.1186/1479-7364-6-17.

[5] 

W.J. Baumgartner, K. Cohen, L. Fox, G. Acquaah-Mensah and L. Hunter, Manual curation is not sufficient for annotation of genomic databases, Bioinformatics 23: ((2007) ), 41–48. doi:10.1093/bioinformatics/btm229.

[6] 

F. Belleau, M. Nolin, N. Tourigny, P. Rigault and J. Morissette, Bio2RDF: Towards a mashup to build bioinformatics knowledge systems, J Biomed Inform 41: ((2008) ), 706–716. doi:10.1016/j.jbi.2008.03.004.

[7] 

S.B. Bhattacharyya, Overview of SNOMED CT, in: Introduction to SNOMED CT, Springer Nature, (2015) , pp. 1–2. doi:10.1007/978-981-287-895-3_1.

[8] 

O. Bodenreider, B. Smith, A. Kumar and A. Burgun, Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies, Artificial Intelligence in Medicine 39: (3) ((2007) ), 183–195. doi:10.1016/j.artmed.2006.12.003.

[9] 

M. Boeker, F. França, P. Bronsert and S. Schulz, TNM-O: Ontology support for staging of malignant tumours, J Biomed Semantics 7: ((2016) ), 64. doi:10.1186/s13326-016-0106-9.

[10] 

Bringing Precision Medicine to Community Oncologists, Cancer Discov 7 (2017), 6–7. doi:10.1158/2159-8290.CD-NB2016-147.

[11] 

A. Callahan, M. Dumontier and N. Shah, HyQue: Evaluating hypotheses using semantic web technologies, J Biomed Semantics 2: (Suppl 2) ((2011) ), 3. doi:10.1186/2041-1480-2-S2-S3.

[12] 

Y. Chen, A.J. Elenee and G. Weber, IBM Watson: How cognitive computing can be applied to big data challenges in life sciences research, Clin Ther 38: ((2016) ), 688–701. doi:10.1016/j.clinthera.2015.12.001.

[13] 

H. Chipman, E.I. George, R.E. McCulloch, M. Clyde, D.P. Foster and R.A. Stine, The practical implementation of Bayesian model selection, Lecture Notes – Monograph Series 38: ((2001) ), 65–134. doi:10.1214/lnms/1215540964.

[14] 

P. Clark, O. Etzioni, T. Khot, A. Sabharwal, O. Tafjord, P.D. Turney and D. Khashabi, Combining retrieval, statistics, and inference to answer elementary science questions, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12–17, 2016, Phoenix, Arizona, USA, (2016) , pp. 2580–2586, http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11963.

[15] 

K. Cohen, K. Verspoor, H. Johnson, C. Roeder, P. Ogren, W.J. Baumgartner, E. White, H. Tipney and L. Hunter, High-precision biological event extraction: Effects of system and of data, Comput Intell 27: ((2011) ), 681–701. doi:10.1111/j.1467-8640.2011.00405.x.

[16] 

R. Davis, H. Shrobe and P. Szolovits, What is a knowledge representation?, Ai Magazine 14: (1) ((1993) ), 17–33, http://www.aaai.org/ojs/index.php/aimagazine/article/view/1029/947.

[17] 

D. Demner-Fushman and N. Elhadad, Aspiring to unintended consequences of natural language processing: A review of recent developments in clinical and consumer-generated text processing, Yearb Med Inform 10: ((2016) ), 224–233. doi:10.15265/IY-2016-017.

[18] 

A. Fabregat, K. Sidiropoulos, P. Garapati, M. Gillespie, K. Hausmann, R. Haw, B. Jassal, S. Jupe, F. Korninger, S. McKay, L. Matthews, B. May, M. Milacic, K. Rothfels, V. Shamovsky, M. Webber, J. Weiser, M. Williams, G. Wu, L. Stein, H. Hermjakob and P. D’Eustachio, The Reactome pathway Knowledgebase, Nucleic Acids Res 44: ((2016) ), 481–487. doi:10.1093/nar/gkv1351.

[19] 

G. Fragoso, S. de Coronado, M. Haber, F. Hartel and L. Wright, Overview and utilization of the NCI thesaurus, Comparative and Functional Genomics 5: (8) ((2004) ), 648–654. doi:10.1002/cfg.445.

[20] 

C. Funk, W.J. Baumgartner, B. Garcia, C. Roeder, M. Bada, K. Cohen, L. Hunter and K. Verspoor, Large-scale biomedical concept recognition: An evaluation of current automatic annotators and their parameters, Bmc Bioinformatics 15: ((2014) ), 59. doi:10.1186/1471-2105-15-59.

[21] 

D. Heckerman, A tutorial on learning with Bayesian networks, in: Learning in Graphical Models, Springer, (1998) , pp. 301–354. doi:10.1007/978-94-011-5014-9_11.

[22] 

H. Hochheiser, M. Castine, D. Harris, G. Savova and R. Jacobson, An information model for computable cancer phenotypes, Bmc Med Inform Decis Mak 16: ((2016) ), 121. doi:10.1186/s12911-016-0358-4.

[23] 

M. Holford and M. Krauthammer, Mutadelic: Mutation analysis using description logic inferencing capabilities, Bioinformatics 31: ((2015) ), 3742–3747. doi:10.1093/bioinformatics/btv467.

[24] 

D.W. Huang, B. Sherman and R. Lempicki, Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res 37: ((2009) ), 1–13. doi:10.1093/nar/gkn923.

[25] 

N. Jahchan, J. Dudley, P. Mazur, N. Flores, D. Yang, A. Palmerton, A. Zmoos, D. Vaka, K. Tran, M. Zhou, K. Krasinska, J. Riess, J. Neal, P. Khatri, K. Park, A. Butte and J. Sage, A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors, Cancer Discov 3: ((2013) ), 1364–1377. doi:10.1158/2159-8290.CD-13-0183.

[26] 

K. Jansen, T. Kim, A. Coenen, V. Saba and N. Hardiker, Harmonising nursing terminologies using a conceptual framework, Stud Health Technol Inform 225: ((2016) ), 471–475. https://www.ncbi.nlm.nih.gov/pubmed/27332245.

[27] 

I. Keseler, A. Mackie, A. Santos-Zavaleta, R. Billington, C. Bonavides-Martínez, R. Caspi, C. Fulcher, S. Gama-Castro, A. Kothari, M. Krummenacker, M. Latendresse, L. Muñiz-Rascado, Q. Ong, S. Paley, M. Peralta-Gil, P. Subhraveti, D. Velázquez-Ramírez, D. Weaver, J. Collado-Vides, I. Paulsen and P. Karp, The EcoCyc database: Reflecting new knowledge about Escherichia coli K-12, Nucleic Acids Res 45: ((2017) ), 543–550. doi:10.1093/nar/gkw1003.

[28] 

W. Kibbe, C. Arze, V. Felix, E. Mitraka, E. Bolton, G. Fu, C. Mungall, J. Binder, J. Malone, D. Vasant, H. Parkinson and L. Schriml, Disease ontology 2015 update: An expanded and updated database of human diseases for linking biomedical knowledge through disease data, Nucleic Acids Res 43: ((2015) ), 1071–1078. doi:10.1093/nar/gku1011.

[29] 

S. Leach, H. Tipney, W. Feng, W. Baumgartner, P. Kasliwal, R. Schuyler, T. Williams, R. Spritz and L. Hunter, Biomedical discovery acceleration, with applications to craniofacial development, Plos Comput Biol 5: ((2009) ), 1000215. doi:10.1371/journal.pcbi.1000215.

[30] 

C. Lindberg, The unified medical language system (UMLS) of the national library of medicine, J Am Med Rec Assoc 61: ((1990) ), 40–42. https://www.ncbi.nlm.nih.gov/pubmed/10104531.

[31] 

K. Livingston, M. Bada, W.J. Baumgartner and L. Hunter, KaBOB: Ontology-based semantic integration of biomedical databases, Bmc Bioinformatics 16: ((2015) ), 126. doi:10.1186/s12859-015-0559-3.

[32] 

S. Lohmann, S. Negru, F. Haag and T. Ertl, VOWL 2: User-oriented visualization of ontologies, in: Lecture Notes in Computer Science, Springer Nature, (2014) , pp. 266–281. doi:10.1007/978-3-319-13704-9_21.

[33] 

Y. Lu, Y. Guo and A. Korhonen, Link prediction in drug-target interactions network using similarity indices, Bmc Bioinformatics 18: ((2017) ), 39. doi:10.1186/s12859-017-1460-z.

[34] 

C. Mungall, J. McMurry, S. Köhler, J. Balhoff, C. Borromeo, M. Brush, S. Carbon, T. Conlin, N. Dunn, M. Engelstad, E. Foster, J. Gourdine, J. Jacobsen, D. Keith, B. Laraway, S. Lewis, J. NguyenXuan, K. Shefchek, N. Vasilevsky, Z. Yuan, N. Washington, H. Hochheiser, T. Groza, D. Smedley, P. Robinson and M. Haendel, The Monarch Initiative: An integrative data and analytic platform connecting phenotypes to genotypes across species, Nucleic Acids Res 45: ((2017) ), 712–722. doi:10.1093/nar/gkw1128.

[35] 

C. Mungall, N. Washington, J. Nguyen-Xuan, C. Condit, D. Smedley, S. Köhler, T. Groza, K. Shefchek, H. Hochheiser, P. Robinson, S. Lewis and M. Haendel, Use of model organism and disease databases to support matchmaking for human disease gene discovery, Hum Mutat 36: ((2015) ), 979–984. doi:10.1002/humu.22857.

[36] 

E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, J. Zheng, C. Mungall, M. Courtot, A. Ruttenberg and Y. He, Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration, Nucleic Acids Res 45: ((2017) ), 347–352. doi:10.1093/nar/gkw918.

[37] 

E. Pafilis, S.I. O’Donoghue, L.J. Jensen, H. Horn, M. Kuhn, N.P. Brown and R. Schneider, Reflect: Augmented browsing for the life scientist, Nature Biotechnology 27: (6) ((2009) ), 508–510. doi:10.1038/nbt0609-508.

[38] 

S. Racunas, N. Shah, I. Albert and N. Fedoroff, HyBrow: A prototype system for computer-aided hypothesis evaluation, Bioinformatics 20: (Suppl 1) ((2004) ), 257–264. doi:10.1093/bioinformatics/bth905.

[39] 

M. Sharp, Toward a comprehensive drug ontology: Extraction of drug-indication relations from diverse information sources, J Biomed Semantics 8: ((2017) ), 2. doi:10.1186/s13326-016-0110-0.

[40] 

N. Smalheiser, V. Torvik and W. Zhou, Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE, Comput Methods Programs Biomed 94: ((2009) ), 190–197. doi:10.1016/j.cmpb.2008.12.006.

[41] 

D. Smedley, Faculty of 1000 evaluation for The human phenotype ontology: Semantic unification of common and rare disease, Faculty of 1000 Ltd., 2017. doi:10.3410/f.725602763.793528156.

[42] 

B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. Goldberg, K. Eilbeck, A. Ireland, C. Mungall, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S. Sansone, R. Scheuermann, N. Shah, P. Whetzel and S. Lewis, The OBO foundry: Coordinated evolution of ontologies to support biomedical data integration, Nat Biotechnol 25: ((2007) ), 1251–1255. doi:10.1038/nbt1346.

[43] 

T. Soldatos, N. Perdigão, N. Brown, K. Sabir and S. O’Donoghue, How to learn about gene function: Text-mining or ontologies?, Methods 74: ((2015) ), 3–15. doi:10.1016/j.ymeth.2014.07.004.

[44] 

S. Suthram, J.T. Dudley, A.P. Chiang, R. Chen, T.J. Hastie and A.J. Butte, Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets, Plos Comput Biol 6: (2) ((2010) ), 1000662. doi:10.1371/journal.pcbi.1000662.

[45] 

The Gene Ontology Consortium, Expansion of the gene ontology knowledgebase and resources, Nucleic Acids Res 45: ((2017) ), 331–338. doi:10.1093/nar/gkw1108.

[46] 

H. Tipney and L. Hunter, An introduction to effective use of enrichment analysis software, Hum Genomics 4: ((2010) ), 202–206. doi:10.1186/1479-7364-4-3-202.

[47] 

S. Tripathi, S. Moutari, M. Dehmer and F. Emmert-Streib, Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules, Bmc Bioinformatics 17: ((2016) ), 129. doi:10.1186/s12859-016-0979-8.

[48] 

E. Willighagen, A. Waagmeester, O. Spjuth, P. Ansell, A. Williams, V. Tkachenko, J. Hastings, B. Chen and D. Wild, The ChEMBL database as linked open data, J Cheminform 5: ((2013) ), 23. doi:10.1186/1758-2946-5-23.

[49] 

J. Xia, A. Fang and X. Zhang, A novel feature selection strategy for enhanced biomedical event extraction using the Turku system, Biomed Res Int 2014: ((2014) ), 205239. doi:10.1155/2014/205239.

[50] 

J. You, Artificial intelligence. DARPA sets out to automate research, Science 347: ((2015) ), 465. doi:10.1126/science.347.6221.465.

[51] 

X. Zeng, X. Zhang, Y. Liao and L. Pan, Prediction and validation of association between microRNAs and diseases by multipath methods, Biochim Biophys Acta 1860: ((2016) ), 2735–2739. doi:10.1016/j.bbagen.2016.03.016.

[52] 

J. Zheng, D. Howsmon, B. Zhang, J. Hahn, D. McGuinness, J. Hendler and H. Ji, Entity linking for biomedical literature, Bmc Med Inform Decis Mak 15: (Suppl 1) ((2015) ), 4. doi:10.1186/1472-6947-15-S1-S4.