Urban street scene analysis using lightweight multi-level multi-path feature aggregation network

Singha, Tanmay; Pham, Duc-Son; Krishna, Aneesh

doi:10.3233/MGS-210353

Urban street scene analysis using lightweight multi-level multi-path feature aggregation network

Article type: Research Article

Authors: Singha, Tanmay^* | Pham, Duc-Son | Krishna, Aneesh

Affiliations: School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Perth, Western Australia, Australia

Correspondence: [*] Corresponding author: Tanmay Singha, School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Perth, Western Australia, Australia. E-mail: [email protected].

Abstract: Urban street scene analysis is an important problem in computer vision with many off-line models achieving outstanding semantic segmentation results. However, it is an ongoing challenge for the research community to develop and optimize the deep neural architecture with real-time low computing requirements whilst maintaining good performance. Balancing between model complexity and performance has been a major hurdle with many models dropping too much accuracy for a slight reduction in model size and unable to handle high-resolution input images. The study aims to address this issue with a novel model, named M2FANet, that provides a much better balance between model’s efficiency and accuracy for scene segmentation than other alternatives. The proposed optimised backbone helps to increase model’s efficiency whereas, suggested Multi-level Multi-path (M2) feature aggregation approach enhances model’s performance in the real-time environment. By exploiting multi-feature scaling technique, M2FANet produces state-of-the-art results in resource-constrained situations by handling full input resolution. On the Cityscapes benchmark data set, the proposed model produces 68.5% and 68.3% class accuracy on validation and test sets respectively, whilst having only 1.3 million parameters. Compared with all real-time models of less than 5 million parameters, the proposed model is the most competitive in both performance and real-time capability.

Keywords: DCNN, semantic segmentation, encoder-decoder, feature map, dilated convolution

DOI: 10.3233/MGS-210353

Journal: Multiagent and Grid Systems, vol. 17, no. 3, pp. 249-271, 2021

Received 26 May 2021

Accepted 18 October 2021

Published: 20 December 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia