Effective training of convolutional neural networks with small, specialized datasets

Plata, Diego Rueda; Ramos-Pollán, Raúl; González, Fabio A.

doi:10.3233/JIFS-169131

Effective training of convolutional neural networks with small, specialized datasets

Issue title: Collective intelligent information and database systems

Guest editors: Ngoc-Thanh Nguyen, Manuel Núñez and Bogdan Trawiński

Article type: Research Article

Authors: Plata, Diego Rueda^a | Ramos-Pollán, Raúl^{a; *} | González, Fabio A.^b

Affiliations: [a] Universidad Industrial de Santander, Bucaramanga, Colombia | [b] MindLab Research Group, Universidad Nacional de Colombia, Bogotá, Colombia

Correspondence: [*] Corresponding author. Raúl Ramos-Pollán, Universidad Industrial de Santander, Cra 27 Calle 9, Bucaramanga, Colombia. Tel./Fax: +57 7 634 4000; E-mail: [email protected].

Abstract: This work proposes a supervised layer-wise strategy to train deep convolutional neural networks (DCNs) particularly suited for small, specialized image datasets. DCNs are increasingly being used with considerable success in image classification tasks and trained over large datasets (with more than 1M images and 10 K classes). Pre-trained successful DCNs can then be used for new smaller datasets (10 K to 100 K images) through a transfer learning process which cannot guarantee competitive a-priori performance if the new data is of different or specialized nature (medical imaging, plant recognition, etc.). We therefore seek out to find competitive techniques to train DCNs for such small datasets, and hereby describe a supervised greedy layer-wise method analogous to that used in unsupervised deep networks. Our method consistently outperforms the traditional methods that train a full DCN architecture in a single stage, yielding an average of over 20% increase in classification performance across all DCN architectures and datasets used in this work. Furthermore, we obtain more interpretable and cleaner visual features. Our method is better suited for small, specialized datasets since we require a training cycle for each DCN layer and this increases its computing time almost linearly with the number of layers. Nevertheless, it still remains as a fraction of the computing time required to generate pre-trained models with large generic datasets, and poses no additional requirements on hardware. This constitutes a solid alternative for training DCNs when transfer learning is not possible and, furthermore, suggests that state of the art DCN performance with large datasets might yet be improved at the expense of a higher computing time.

Keywords: Convolutional networks, deep learning, greedy layer-wise training

DOI: 10.3233/JIFS-169131

Journal: Journal of Intelligent & Fuzzy Systems, vol. 32, no. 2, pp. 1333-1342, 2017

Published: 30 January 2017

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia