Obtaining accurate and comprehensible classifiers using oracle coaching

Johansson, Ulf; Sönströd, Cecilia; Löfström, Tuve; Boström, Henrik

doi:10.3233/IDA-2012-0522

Obtaining accurate and comprehensible classifiers using oracle coaching

Article type: Research Article

Authors: Johansson, Ulf^{a; *} | Sönströd, Cecilia^a | Löfström, Tuve^a | Boström, Henrik^b

Affiliations: [a] School of Business and IT, University of Borås, Borås, Sweden | [b] Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden

Correspondence: [*] Corresponding author: Ulf Johansson, School of Business and IT, University of Borås, Borås, Sweden. E-mail: [email protected]

Abstract: While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

Keywords: Classification, comprehensibility, decision trees, decision lists, oracle coaching

DOI: 10.3233/IDA-2012-0522

Journal: Intelligent Data Analysis, vol. 16, no. 2, pp. 247-263, 2012

Published: 1 March 2012

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia