Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Vaikunta Pai, T.a | Nethravathi, P.S.b | Birau, Ramonac; * | Popescu, Virgild | Karthik Pai, B.H.e | Naik, Pramod Vishnuf
Affiliations: [a] Department of Information Science and Engineering, NMAM Institute of Technology (NMAMIT), Nitte (Deemed to be University), Nitte, India | [b] Department of Computer Science and Engineering, Shree Devi Institute of Technology, Mangalore, India | [c] ”Constantin Brâncuşi” University of Târgu Jiu, Faculty of Economic Science, Tg-Jiu, Romania | [d] University of Craiova, Faculty of Economics and Business Administration, Craiova, Romania | [e] Department of Information Science and Engineering, NMAM Institute of Technology (NMAMIT), Nitte (Deemed to be University), Nitte, India | [f] Software Development Engineer, MResult Services Private Limited, Mangalore, India
Correspondence: [*] Corresponding author. Ramona Birau, ”Constantin Brâncuşi” University of Târgu Jiu, Faculty of Economic Science, Tg-Jiu, Romania. E-mail: [email protected].
Abstract: Multimodal conversational AI systems have gained significant attention due to their potential to enhance user experience and enable more interactive and engaging interactions. This vital and complex research field seeks to integrate diverse modalities, including text, images, and speech, to develop conversational AI systems capable of comprehending, perceiving, and generating responses within a multimodal framework. By seamlessly incorporating various modalities, these systems can provide a more comprehensive and immersive conversational experience, enabling users to communicate in a more natural and intuitively. This research presents a novel multimodal architecture empowered by Deep Neural Networks (DNNs) for simultaneous integration and processing of diverse modalities. Multimodal data encompasses various sources like text, images, audio, video, or sensor data. The objective is to merge and harness information from these modalities to amplify learning and enhance performance across a spectrum of tasks. This research explores the extension of ChatGPT, a state-of-the-art conversational AI model, to handle multimodal inputs, including text and images or text and speech. We present a comprehensive analysis of the benefits and challenges of integrating various options into ChatGPT, examining their impact on understanding, interaction, and overall system performance. Through extensive experimentation and evaluation, we demonstrate the potential of multimodal ChatGPT to provide richer, more context-aware conversations, while also highlighting the existing limitations and open research questions in this evolving field. Multimodal ChatGPT outperform the current GPT-3.5 by 16.51% and it is clear that multimodal ChatGPTis capable of better performance and offer a pathway for further progress in the field of language models.
Keywords: Large language model, generative pre-trained transformer, deep learning, State-Of-The-Art (SOTA), artificial intelligence (AI), reinforcement training from human feedback, natural language processing (NLP), convolutional neural networks (CNN), recurrent neural networks (RNN)
DOI: 10.3233/JIFS-239465
Journal: Journal of Intelligent & Fuzzy Systems, vol. Pre-press, no. Pre-press, pp. 1-17, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]