Affiliations: Laboratory for Bioinformatics, Wageningen University
and Research Centre, Wageningen, The Netherlands | Netherlands Bioinformatics Centre (NBIC), Nijmegen,
The Netherlands
Note: [] Corresponding author. Laboratory for Bioinformatics, Wageningen
University and Research Centre, P.O. Box 569, 6700 AN Wageningen, The
Netherlands. E-mail: [email protected]
Abstract: Background: In the field of bioinformatics interchangeable data
formats based on XML are widely used. XML-type data is also at the core of most
web services. With the increasing amount of data stored in XML comes the need
for storing and accessing the data. In this paper we analyse the suitability of
different database systems for storing and querying large datasets in general
and Medline in particular. Results: All reviewed database systems perform
well when tested with small to medium sized datasets, however when the full
Medline dataset is queried a large variation in query times is observed. Conclusions: There is not one system that is vastly superior to the others in
this comparison and, depending on the database size and the query requirements,
different systems are most suitable. The best all-round solution is the Oracle
11~g database system using the new binary storage option. Alias-i's Lingpipe is
a more lightweight, customizable and sufficiently fast solution. It does
however require more initial configuration steps. For data with a changing XML
structure Sedna and BaseX as native XML database systems or MySQL with an
XML-type column are suitable.