Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Zhao, Zhaolina | Bo, Kaiminga; * | Hsu, Chih-Yua | Liao, Lyuchaoa; b
Affiliations: [a] Fujian Provincial Universities Engineering Research Center for Intelligent Driving Technology, Fujian University of Technology, Fuzhou, China | [b] Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou, China
Correspondence: [*] Corresponding author: Kaiming Bo, Fujian Provincial Universities Engineering Research Center for Intelligent Driving Technology, Fujian University of Technology, Fuzhou, China. E-mail: [email protected].
Abstract: With the rapid development of unmanned aerial vehicle (UAV) technology and computer vision, real-time object detection in UAV aerial images has become a current research hotspot. However, the detection tasks in UAV aerial images face challenges such as disparate object scales, numerous small objects, and mutual occlusion. To address these issues, this paper proposes the ASM-YOLO model, which enhances the original model by replacing the Neck part of YOLOv8 with an efficient bidirectional cross-scale connections and adaptive feature fusion (ABiFPN) . Additionally, a Structural Feature Enhancement Module (SFE) is introduced to inject features extracted by the backbone network into the Neck part, enhancing inter-network information exchange. Furthermore, the MPDIoU bounding box loss function is employed to replace the original CIoU bounding box loss function. A series of experiments was conducted on the VisDrone-DET dataset, and comparisons were made with the baseline network YOLOv8s. The experimental results demonstrate that the proposed model in this study achieved reductions of 26.1% and 24.7% in terms of parameter count and model size, respectively. Additionally, during testing on the evaluation set, the proposed model exhibited improvements of 7.4% and 4.6% in the AP50 and mAP metrics, respectively, compared to the YOLOv8s baseline model, thereby validating the practicality and effectiveness of the proposed model. Subsequently, the generalizability of the algorithm was validated on the DOTA and DIOR datasets, which share similarities with aerial images captured by drones. The experimental results indicate significant enhancements on both datasets.
Keywords: Computer vision, drone aerial images, multi-scale object detection, real-time object detection, feature fusion
DOI: 10.3233/IDA-230929
Journal: Intelligent Data Analysis, vol. Pre-press, no. Pre-press, pp. 1-22, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]