Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Pan, Yiming | Cheng, Hua; * | Fang, Yiquan | Liu, Yufei
Affiliations: School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
Correspondence: [*] Corresponding author. Hua Cheng. E-mail: [email protected].
Abstract: Pre-trained Visual Language Models (VLMs) like CLIP have shown great potential in the multimodal domain. Among this, using different modal contexts and interaction features to construct prompt can stimulate the model’s prior knowledge circuit more accurately, thus generating better outputs. However, in CLIP, the formal mismatch of textual descriptions between the pre-training and inference phases results in a suboptimal representation ability of prompt, which is detrimental to model alignment learning. Therefore, Region-Attention Prompt (RAP) is proposed, which introduces region features to enrich the semantic representation of prompt. RAP is acquired by the Cross-Attention mechanism between images and texts, and it is essentially a region-level prompt with category-sensitive properties. For each category, RAP adaptively assigns greater attention weight to image regions that are more semantically relevant to the category. Besides, CLIP is equipped with RAP (called RA-CLIP) to improve image classification performance in generalization scenarios. Extensive experiments demonstrate that RA-CLIP outperforms the current SOTA CoCoOp 0.4% - 4.16% on base classes and 0.25% - 11.34% on new classes, across 7 datasets. In addition, we show that focusing on category-related regions to construct prompt can further improve the model’s alignment ability.
Keywords: Prompt learning, CLIP, Cross-Attention mechanism, image classfication
DOI: 10.3233/JIFS-230879
Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 5, pp. 7221-7235, 2023
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]