Human action recognition with transformer based on convolutional features

Shi, Chengcheng; Liu, Shuxin

doi:10.3233/IDT-240159

Human action recognition with transformer based on convolutional features

Article type: Research Article

Authors: Shi, Chengcheng | Liu, Shuxin^{1; *}

Affiliations: School of Electrical Engineering, Shanghai Dianji University, Shanghai, China

Correspondence: [*] Corresponding author: Shuxin Liu, School of Electrical Engineering, Shanghai Dianji University, 300 Shuihua Road, Pudong New Area, Shanghai 201306, China. E-mail: [email protected].

Note: [1] These authors contributed equally to this work.

Abstract: As one of the key research directions in the field of computer vision, human action recognition has a wide range of practical application values and prospects. In the fields of video surveillance, human-computer interaction, sports analysis, and healthcare, human action recognition technology shows a broad application prospect and potential. However, the diversity and complexity of human actions bring many challenges, such as handling complex actions, distinguishing similar actions, coping with changes in viewing angle, and overcoming occlusion problems. To address the challenges, this paper proposes an innovative framework for human action recognition. The framework combines the latest pose estimation algorithms, pre-trained CNN models, and a Vision Transformer to build an efficient system. The first step involves utilizing the latest pose estimation algorithm to accurately extract human pose information from real RGB image frames. Then, a pre-trained CNN model is used to perform feature extraction on the extracted pose information. Finally, the Vision Transformer model is applied for fusion and classification operations on the extracted features. Experimental validation is conducted on two benchmark datasets, UCF 50 and UCF 101, to demonstrate the effectiveness and efficiency of the proposed framework. The applicability and limitations of the framework in different scenarios are further explored through quantitative and qualitative experiments, providing valuable insights and inspiration for future research.

Keywords: Human action recognition, convolutional features, pose estimation, transformer

DOI: 10.3233/IDT-240159

Journal: Intelligent Decision Technologies, vol. 18, no. 2, pp. 881-896, 2024

Published: 7 June 2024

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia