Vehicle classification based on audio-visual feature fusion with low-quality images and noise

Zhao, Yiming; Zhao, Hongdong; Zhang, Xuezhi; Liu, Weina

doi:10.3233/JIFS-232812

Vehicle classification based on audio-visual feature fusion with low-quality images and noise

Article type: Research Article

Authors: Zhao, Yiming | Zhao, Hongdong^{; *} | Zhang, Xuezhi | Liu, Weina

Affiliations: School of Electronic Information and Engineering, Hebei University of Technology, Tianjin, P.R. China

Correspondence: [*] Corresponding author. Hongdong Zhao, School of Electronic Information and Engineering, Hebei University of Technology, Tianjin, 300401, P.R. China. E-mail: [email protected].

Abstract: In Intelligent Transport Systems (ITS), vision is the primary mode of perception. However, vehicle images captured by low-cost traffic cameras under challenging weather conditions often suffer from poor resolution and insufficient detail representation. On the other hand, vehicle noise provides complementary auditory features that offer advantages such as environmental adaptability and a large recognition distance. To address these limitations and enhance the accuracy of low-quality traffic surveillance classification and identification, an effective audio-visual feature fusion method is crucial. This paper presents a research study that establishes an Urban Road Vehicle Audio-visual (URVAV) dataset specifically designed for low-quality images and noise recorded in complex weather conditions. For low-quality vehicle image classification, the paper proposes a simple Convolutional Neural Network (CNN)-based model called Low-quality Vehicle Images Net (LVINet). Additionally, to further enhance classification accuracy, a spatial channel attention-based audio-visual feature fusion method is introduced. This method converts one-dimensional acoustic features into a two-dimensional audio Mel-spectrogram, allowing for the fusion of auditory and visual features. By leveraging the high correlation between these features, the representation of vehicle characteristics is effectively enhanced. Experimental results demonstrate that LVINet achieves a classification accuracy of 93.62% with reduced parameter count compared to existing CNN models. Furthermore, the proposed audio-visual feature fusion method improves classification accuracy by 7.02% and 4.33% when compared to using single audio or visual features alone, respectively.

Keywords: Vehicle classification, feature fusion, convolutional neural network, low-quality images

DOI: 10.3233/JIFS-232812

Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 5, pp. 8931-8944, 2023

Published: 04 November 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia