Deteriorated image classification model for malayalam palm leaf manuscripts

Bipin Nair, B.J.; Shobha Rani, N.; Khan, Mustaqeem

doi:10.3233/JIFS-223713

Deteriorated image classification model for malayalam palm leaf manuscripts

Article type: Research Article

Authors: Bipin Nair, B.J.^a | Shobha Rani, N.^{a; *} | Khan, Mustaqeem^b

Affiliations: [a] Department of Computer Science, School of Computing, Mysuru Campus, Amrita Vishwa Vidyapeetham, India | [b] Department of Computer Vision, Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE

Correspondence: [*] Corresponding author. [*]N. Shobha Rani, Department of Computer Science, School of Computing, Mysuru Campus, Amrita Vishwa Vidyapeetham, India. E-mail: [email protected].

Abstract: The method for document image classification presented in this paper mainly focuses on six different Malayalam palm leaf manuscripts categories. The proposed approach consists of three phases: dataset analysis, building a bag of words repository followed by recognition and classification using a voting approach. The palm leaf manuscripts are initially subject to pre-processing and subjective analysis techniques to create a bag of words repository during the dataset analysis phase. Next, the textual components from the manuscripts are extracted for recognition using Tesseract 4 OCR with default and self-adapted training sets and a deep-learning algorithm. The Bag of Words approach is used in the third phase to categorize the palm leaf manuscripts based on textual components recognized by OCR using a voting process. Experimental analysis was done to analyze the proposed approach with and without the voting techniques, varying the size of the Bag of Words with default/self-adapted training datasets using Tesseract OCR and a deep learning model. Experimental analysis proves that the proposed approach works equally well with/ without voting with a bag of words technique using Tesseract OCR. It is noticed that, for document classification, an overall accuracy of 83% without voting and 84.5% with voting is achieved with an F-score of 0.90 in both cases using Teserract OCR. Overall, the proposed approach proves to be high generalizable based on trial wise experiments with Bag of Words, offering a reliable way for classifying deteriorated Malayalam handwritten palm manuscripts.

Keywords: Document image classification, palm leaf manuscripts, handwritten document analysis, Tesseract OCR, deep learning, ancient document images

DOI: 10.3233/JIFS-223713

Journal: Journal of Intelligent & Fuzzy Systems, vol. 45, no. 3, pp. 4031-4049, 2023

Published: 24 August 2023

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia