Affiliations: Stanford Medical Informatics, Stanford University, Stanford, CA 94305, USA
Note:  Corresponding author: Nigam Shah, Stanford Medical Informatics, Stanford University, 251 Campus Drive, X-219, Stanford, CA 94305, USA. Tel.: +650 725 6236; Fax: +650 725 7944; E-mail: [email protected]
Abstract: The role of proteins and their function in pathways is crucial to understanding complex biological processes and their failures that lead to disease. With over 200 pathway databases in existence, it is not possible for biologists to examine a pathway in all of them. The emergence and adoption of Biological Pathways Exchange (BioPAX), a standardized format for exchanging pathway information, provides a unique opportunity to integrate knowledge from multiple pathway databases. We conducted a case study integrating multiple pathway databases using BioPAX and Oracle's resource description framework (RDF) data repository. This integration enables querying across different species and across multiple pathway resources simultaneously. It also enables comparison of the degree of complementarity across different pathway sources. We find that BioPAX and RDF are powerful mechanisms for data exchange and integration and are instrumental in enabling an integrated resource. The integrated dataset/s and code for our implementation in this case study is available as a resource we named the pathway knowledge base (PKB, http://pkb.stanford.edu).