Affiliations: University of Pretoria, Pretoria, South Africa
Corresponding author: Catherine Halsey, University of Pretoria, Pretoria, South Africa. E-mail: [email protected]
Abstract: Classifier accuracy is extremely important and can be improved by increasing the size of the training data set. However, in experimental studies it might be very costly to survey cases; therefore, limiting sample size to a minimum is essential. Sometimes very large data sets might not contain enough information, and additional computer resources do not improve accuracy. Stopping at the optimal iteration results in the minimum amount of observations being used, possibly saving computational time and sampling costs. For this reason, a sequential method of training classifiers can be of use. This paper proposes a sequential method that seeks to sample the minimum number of observations necessary to train a classifier to estimate the feasible minimum rate of misclassification, the Bayes error. Using SAS/IML® Studio, this method of classifier training proves ideal as it gives the researcher more control over the process by specifying when the sequential procedure should be stopped. It is not restricted to any single method of classification, and it never seeks to obtain an unfeasibly low misclassification rate.
Keywords: Bayes error, fixed-width confidence interval, classifier training