Affiliations: [a] Institute of Mathematics, Warsaw University, Banacha 2, Warsaw 02-097, Poland | [b] Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warszawa, Poland
Abstract: We present an efficient method for decision tree construction from large data sets, which are assumed to be stored in database servers, and be accessible by SQL queries. The proposed method minimizes the number of simple queries necessary to search for the best splits (cut points) by employing “divide and conquer” search strategy. To make it possible, we develop some novel evaluation measures which are defined on intervals of attribute domains. Proposed measures are necessary to estimate the quality of the best cut in a given interval. We propose some applications of the presented approach in discretization and construction of “soft decision tree”, which is a novel classifier model.
Keywords: data mining, decision tree, large databases, discernibility measure