Constructing a Decision Tree for Graph-Structured Data and its Applications

Geamsakul, Warodom; Yoshida, Tetsuya; Ohara, Kouzou; Motoda, Hiroshi; Yokoi, Hideto; Takabayashi, Katsuhiko

Constructing a Decision Tree for Graph-Structured Data and its Applications

Issue title: Advances in Mining Graphs, Trees and Sequences

Article type: Research Article

Affiliations: Institute of Scientific and Industrial Research, Osaka University, Japan. {warodom;yoshida;ohara;motoda}@ar.sanken.osaka-u.ac.jp | Division for Medical Informatics, Chiba University Hospital, Japan. [email protected];[email protected]

Note: [] Address for correspondence: Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan

Abstract: A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not be constructed for the data which is not explicitly expressedwith attribute-value pairs. This paper proposes a method called Decision Tree Graph-Based Induction (DT-GBI), which constructs a classifier (decision tree) for graph-structured data while simultaneously constructing attributes for classification using GBI. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a method for feature construction. The predictive accuracy of a decision tree is affected by which attributes (patterns) are used and how they are constructed. A beam search is employed to extract good enough discriminative patterns within the greedy search framework. Pessimistic pruning is incorporated to avoid overfitting to the training data. Experiments using a DNA dataset were conducted to see the effect of the beam width and the number of chunking at each node of a decision tree. The results indicate that DT-GBI that uses very little prior domain knowledge can construct a decision tree that is comparable to other classifiers constructed using the domain knowledge. DT-GBI was also applied to analyze a real-world hepatitis dataset as a part of evidence-based medicine. Four classification tasks of the hepatitis data were conducted using only the time-series data of blood inspection and urinalysis. The preliminary results of experiments, both constructed decision trees and their predictive accuracies as well as extracted patterns, are reported in this paper. Some of the patterns match domain experts experience and the overall results are encouraging.

Keywords: graphmining, graph-based induction, decision tree, beamsearch, evidence-basedmedicine

Journal: Fundamenta Informaticae, vol. 66, no. 1-2, pp. 131-160, 2005

Received 17 June 2005

Accepted 17 June 2005

Published: 2005

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia