Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Special section: Selected papers of LKE 2019
Guest editors: David Pinto, Vivek Singh and Fernando Perez
Article type: Research Article
Authors: Morales, Valentin | Gomez, Juan Carlos*; | Van Amerongen, Saskia
Affiliations: Departamento de Ingeniería Electrónica, División de Ingenierías Campus Irapuato-Salamanca, Universidad de Guanajuato, Salamanca, México
Correspondence: [*] Corresponding authors. Juan Carlos Gomez, Departamento de Ingeniería Electrónica, División de Ingenierías Campus Irapuato-Salamanca, Universidad de Guanajuato, Salamanca, México, Tel. +524646479940. E-mail: [email protected].
Abstract: Email is one of the most popular ways of communication. Nevertheless, it is also a potential tool to deceive and fill users with unwanted publicity, which reduces productivity. To alleviate such fact, a common solution has been building machine learning models based on the content of emails to automatically separate emails (spam vs ham). In this work, a study of a set of machine learning models and content-based features for the problem of cross-dataset email classification is presented. This problem consists in training and testing the models using different datasets; considering the fact that the datasets were collected under different independent setups. This has the purpose of simulating future variable or unpredictable conditions in the emails content distributions as could happen in a real setting, where models are trained using emails from a certain period of time, group of users or accounts, but tested with emails from other users or accounts. Experiments were conducted with the models and features using different datasets and two setups, same-dataset, and cross-dataset, to show the complexity of the later. The performance was evaluated using the Area Under the ROC Curve, a common metric in email classification. The results show interesting insights for the problem.
Keywords: Email classification, data mining, machine learning, cross-dataset classification
DOI: 10.3233/JIFS-179890
Journal: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2279-2290, 2020
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]