Fundamenta Informaticae - Volume 89, issue 1 - Journals

Progress on Multi-Relational Data Mining

Article Type: Other

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. i-iii, 2008

An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Authors: De Knijf, Jeroen | Feelders, Ad

Article Type: Research Article

Abstract: In recent years a variety of mining algorithms, to derive all frequent subtrees from a database of labeled ordered rooted trees has been developed. These algorithms share properties such as enumeration strategies and pruning techniques. They differ however in the tree inclusion relation used and the way attribute values are dealt with. In this work we investigate the different approaches with respect to 'usefulness' of the derived patterns, in particular, the performance of classifiers that use …the derived patterns as features. In order to find a good trade-off between expressiveness and runtime performance of the different approaches, we also take the complexity of the different classifiers into account, as well as the run time and memory usage of the different approaches. The experiments are performed on two real data sets, and two synthetic data sets. The results show that significant improvement in both predictive performance and computational efficiency can be gained by choosing the right tree mining approach. Show more

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 1-22, 2008

Price: EUR 27.50

Multi-Dimensional Relational Sequence Mining

Authors: Esposito, Floriana | Di Mauro, Nicola | Basile, Teresa M.A. | Ferilli, Stefano

Article Type: Research Article

Abstract: The issue addressed in this paper concerns the discovery of frequent multi-dimensional patterns from relational sequences. The great variety of applications of sequential pattern mining, such as user profiling, medicine, local weather forecast and bioinformatics, makes this problem one of the central topics in data mining. Nevertheless, sequential information may concern data on multiple dimensions and, hence, the mining of sequential patterns from multi-dimensional information results very important. In a multi-dimensional sequence …each event depends on more than one dimension, such as in spatio-temporal sequences where an event may be spatially or temporally related to other events. In literature, the multi-relational data mining approach has been successfully applied to knowledge discovery fromcomplex data. However, there exists no contribution to manage the general case of multi-dimensional data in which, for example, spatial and temporal information may co-exist. This work takes into account the possibility to mine complex patterns, expressed in a first-order language, in which events may occur along different dimensions. Specifically, multidimensional patterns are defined as a set of atomic first-order formulae in which events are explicitly represented by a variable and the relations between events are represented by a set of dimensional predicates. A complete framework and an Inductive Logic Programming algorithm to tackle this problem are presented along with some experiments on artificial and real multi-dimensional sequences proving its effectiveness. Show more

Keywords: Multi-relational sequence mining, Inductive Logic Programming, Sequence analysis

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 23-43, 2008

Price: EUR 27.50

Compile the Hypothesis Space: Do it Once, Use it Often

Authors: Fonseca, Nuno A. | Camacho, Rui | Rocha, Ricardo | Costa, Vítor Santos

Article Type: Research Article

Abstract: Inductive Logic Programming (ILP) is a powerful and well-developed abstraction for multi-relational data mining techniques. Despite the considerable success of ILP, deployed ILP systems still have efficiency problems when applied to complex problems. In this paper we propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique is based on the Mode Directed Inverse Entailment approach to ILP, where a bottom clause is generated for each example …and the generated clauses are subsets of the literals of such bottom clause. We propose to store in a prefix-tree all clauses that can be generated from all bottom clauses together with some extra information. We show that this information is sufficient to estimate the number of examples that can be deduced froma clause and present an ILP algorithmthat exploits this representation. We also present an extension of the algorithm where each prefix-tree is computed only once (compiled) per example. The evaluation of hypotheses requires only basic and efficient operations on trees. This proposal avoids re-computation of hypothesis' value in theorylevel search, in cross-validation evaluation procedures and in parameter tuning. Both proposals are empirically evaluated on real applications and considerable speedups were observed. Show more

Keywords: Mode Directed Inverse Entailment, Efficiency, Data Structures, Compilation

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 45-67, 2008

Price: EUR 27.50

Learning from Skewed Class Multi-relational Databases

Authors: Guo, Hongyu | Viktor, Herna L.

Article Type: Research Article

Abstract: Relational databases, with vast amounts of data¨Cfrom financial transactions, marketing surveys, medical records, to health informatics observations¨C and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for relevant features both from a target relation (in which each tuple is associated with a class label) and relations related to the target, in …order to better classify target relation tuples. However, in many practical database applications, such as credit card fraud detection and disease diagnosis, the target tuples are highly imbalanced. That is, the number of examples of one class (majority class) in the target relation is much higher than the others (minority classes). Many existing methods thus tend to produce poor predictive performance over the underrepresented class in the data. This paper presents a strategy to deal with such imbalanced multirelational data. The method learns from multiple views (feature sets) of relational data in order to construct view learners with different awareness of the imbalanced problem. These different observations possessed by multiple view learners are then combined, in order to yield a model which has better knowledge on both the majority and minority classes in a relational database. Experiments performed on six benchmarking data sets show that the proposed method achieves promising results when compared with other popular relational data mining algorithms, in terms of the ROC curve and AUC value obtained. In particular, an important result indicates that the method is superior when the class imbalanced is very high. Show more

Keywords: Multirelational Data Mining, Classification, Multi-view Learning, Relational Database, Imbalanced Classes, Ensemble

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 69-94, 2008

Price: EUR 27.50

A Restarted Strategy for Efficient Subsumption Testing

Authors: Kuželka, Ondřej | Železný, Filip

Article Type: Research Article

Abstract: We study runtime distributions of subsumption testing. On graph data randomly sampled from two different generative models we observe a gradual growth of the tails of the distributions as a function of the problem instance location in the phase transition space. To avoid the heavy tails, we design a randomized restarted subsumption testing algorithm RESUMER2. The algorithm is complete in that it correctly decides both subsumption and non-subsumption in finite time. A basic restarted strategy is …augmented by allowing certain communication between odd and even restarts without losing the exponential runtime distribution decay guarantee resulting from mutual independence of restart pairs. We empirically test RESUMER2 against the state-of-the-art subsumption algorithm Django on generated graph data as well as on the predictive toxicology challenge (PTC) data set. RESUMER2 performs comparably with Django for relatively small examples (tens to hundreds of literals), while for further growing example sizes, RESUMER2 becomes vastly superior. Show more

Keywords: Relational learning, Graph Mining, Subsumption, Homomorphism, Randomized Complete Algorithm

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 95-109, 2008

Price: EUR 27.50

Relational Transformation-based Tagging for Activity Recognition

Authors: Landwehr, Niels | Gutmann, Bernd | Thon, Ingo | De Raedt, Luc | Philipose, Matthai

Article Type: Research Article

Abstract: The ability to recognize human activities from sensory information is essential for developing the next generation of smart devices. Many human activity recognition tasks are – from a machine learning perspective – quite similar to tagging tasks in natural language processing. Motivated by this similarity, we develop a relational transformation-based tagging system based on inductive logic programming principles, which is able to cope with expressive relational representations as well as a background …theory. The approach is experimentally evaluated on two activity recognition tasks and an information extraction task, and compared to Hidden Markov Models, one of the most popular and successful approaches for tagging. Show more

Keywords: relational learning, sequence tagging, activity recognition, information extraction

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 111-129, 2008

Price: EUR 27.50

Learning Ground CP-Logic Theories by Leveraging Bayesian Network Learning Techniques

Authors: Meert, Wannes | Struyf, Jan | Blockeel, Hendrik

Article Type: Research Article

Abstract: Causal relations are present in many application domains. Causal Probabilistic Logic (CP-logic) is a probabilistic modeling language that is especially designed to express such relations. This paper investigates the learning of CP-logic theories (CP-theories) from training data. Its first contribution is SEM-CP-logic, an algorithm that learns CP-theories by leveraging Bayesian network (BN) learning techniques. SEM-CP-logic is based on a transformation between CP-theories and BNs. That is, the method applies BN learning techniques …to learn a CP-theory in the form of an equivalent BN. To this end, certain modifications are required to the BN parameter learning and structure search, the most important one being that the refinement operator used by the search must guarantee that the constructed BNs represent valid CP-theories. The paper's second contribution is a theoretical and experimental comparison between CP-theory and BN learning. We show that the most simple CP-theories can be represented with BNs consisting of noisy-OR nodes, while more complex theories require close to fully connected networks (unless additional unobserved nodes are introduced in the network). Experiments in a controlled artificial domain show that in the latter cases CP-theory learning with SEM-CP-logic requires fewer training data than BN learning. We also apply SEM-CP-logic in a medical application in the context of HIV research, and show that it can compete with state-of-the-art methods in this domain. Show more

Keywords: Statistical Relational Learning, CP-logic

Citation: Fundamenta Informaticae, vol. 89, no. 1, pp. 131-160, 2008

Price: EUR 27.50

Fundamenta Informaticae - Volume 89, issue 1

Progress on Multi-Relational Data Mining

An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Multi-Dimensional Relational Sequence Mining

Compile the Hypothesis Space: Do it Once, Use it Often

Learning from Skewed Class Multi-relational Databases

A Restarted Strategy for Efficient Subsumption Testing

Relational Transformation-based Tagging for Activity Recognition

Learning Ground CP-Logic Theories by Leveraging Bayesian Network Learning Techniques

North America

Europe

Asia