Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Cardoso, Margarida G.M.S.a | Themido, Isabel H.a | Moura Pires, Fernandob; *
Affiliations: [a] CESUR, Inst. Superior Técnico, Univ. Técnica de Lisboa, Lisboa, Portugal | [b] Dep. Informática, Fac. Ciências e Tecnologia, Univ. Nova de Lisboa, Quinta da Torre, 2825-114 Monte de Caparica, Portugal
Correspondence: [*] Corresponding author. E-mail addresses: [email protected] (M.G.M.S. Cardoso), [email protected] (I.H. Themido), [email protected] (F.M. Pires)
Abstract: This paper discusses the evaluation of a clustering solution. Criteria based on the number of clusters and discrimination and classification processes are used to evaluate a clustering solution. The proposed approach is based on two paradigms: Statistics and Machine Learning. A multimethodological approach is advocated in the construction of models associating between properties and clusters, to provide a wider and richer set of analysis perspectives and a better knowledge discovery. Specifically, the construction of classification and discrimination logical models as a complement of quantitative statistical models is particularly useful when most of the available information is of a qualitative nature (nominal or ordinal variables). Both, the classification's global precision and the comprehension added by the discriminant model to the association between variables and clusters, are essential to evaluate a clustering solution. Depending on the dimension of the sample, descriptive analysis performed can be validated through the partition in two of the total sample – (one sub-sample for model build-up and another (holdout) for validation) – or by other procedures of cross-validation. The proposed evaluation approach is applied to a Marketing Tourism case study. The clustering solution is built upon a sample of more than 2500 Portuguese clients of Pousadas de Portugal Hotels. The database includes variables related to the evaluation of stay (per client) at the Pousadas and profiles of the surveyed clients on holidays, demographic and psychographic aspects. Measures of association, Chi-square tests, ANOVA, Discriminant Analysis, Logistic Regression, and Rule Induction (based on CN2 and C4.5 algorithms) are applied in evaluating the clustering solution built through a K-Means process.
Keywords: Clustering, Multivariate statistics, Machine learning, Marketing and tourism
DOI: 10.3233/IDA-1999-3606
Journal: Intelligent Data Analysis, vol. 3, no. 6, pp. 491-510, 1999
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]