Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Special issue on Intelligent Biomedical Data Analysis and Processing
Guest editors: Deepak Gupta, Oscar Castillo and Ashish Khanna
Article type: Research Article
Authors: Agrawal, Ankita; * | Tripathi, Sarsijb | Vardhan, Manuc
Affiliations: [a] Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India | [b] Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, Uttar Pradesh, India | [c] Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India
Correspondence: [*] Corresponding author: Ankit Agrawal, Research Scholar, Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, India. Tel.: +90 3963 5153; E-mail: [email protected].
Abstract: Active learning approach is well known method for labeling huge un-annotated dataset requiring minimal effort and is conducted in a cost efficient way. This approach selects and adds most informative instances to the training set iteratively such that the performance of learner improves with each iteration. Named entity recognition (NER) is a key task for information extraction in which entities present in sequences are labeled with correct class. The traditional query sampling strategies for the active learning only considers the final probability value of the model to select the most informative instances. In this paper, we have proposed a new active learning algorithm based on the hybrid query sampling strategy which also considers the sentence similarity along with the final probability value of the model and compared them with four other well known pool based uncertainty query sampling strategies based active learning approaches for named entity recognition (NER) i.e. least confident sampling, margin of confidence sampling, ratio of confidence sampling and entropy query sampling strategies. The experiments have been performed over three different biomedical NER datasets of different domains and a Spanish language NER dataset. We found that all the above approaches are able to reach to the performance of supervised learning based approach with much less annotated data requirement for training in comparison to that of supervised approach. The proposed active learning algorithm performs well and further reduces the annotation cost in comparison to the other sampling strategies based active algorithm in most of the cases.
Keywords: Active learning, named entity recognition, uncertainty query sampling
DOI: 10.3233/IDT-200048
Journal: Intelligent Decision Technologies, vol. 15, no. 1, pp. 99-114, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]