Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Machine Learning in Bioinformatics
Article type: Research Article
Authors: Hämäläinen, Wilhelmiina
Affiliations: Department of Biosciences, University of Eastern Finland, Kuopio, Finland. [email protected]
Note: [] Address for correspondence: Department of Biosciences, University of Eastern Finland, Finland Emil Aaltonen Fund (Emil Alltosen säätiö) has supported this research with a personal grant
Abstract: Dependency analysis is one of the central problems in bioinformatics and all empirical science. In genetics, for example, an important problem is to find which gene alleles are mutually dependent or which alleles and diseases are dependent. In ecology, a similar problem is to find dependencies between different species or groups of species. In both cases a classical solution is to consider all pairwise dependencies between single attributes and evaluate the relationships with some statistical measure like the χ2-measure. It is known that the actual dependency structures can involve more attributes, but the existing computational methods are too inefficient for such an exhaustive search. In this paper, we introduce efficient search methods for positive dependencies of the form X → A with typical statistical measures. The efficiency is achieved by a special kind of a branch-and-bound search which also prunes out redundant rules. Redundant attributes are especially harmful in dependency analysis, because they can blur the actual dependencies and even lead to erroneous conclusions. We consider two alternative definitions of redundancy: the classical one and a stricter one. We improve our previous algorithm for searching for the best strictly non-redundant dependency rules and introduce a totally new algorithm for searching for the best classically non-redundant rules. According to our experiments, both algorithms can prune the search space very efficiently, and in practice no minimum frequency thresholds are needed. This is an important benefit, because biological data sets are typically dense, and the alternative search methods would require too large minimum frequency thresholds for any practical purpose.
Keywords: statistical dependence, redundancy, χ$^{2}$-measure, z-score, search algorithms
DOI: 10.3233/FI-2011-603
Journal: Fundamenta Informaticae, vol. 113, no. 2, pp. 117-150, 2011
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]