Automatic Chinese character similarity measurement

Liu, Ming; Rus, Vasile; Li, Yue; Sheng, Chuqian; Liu, Li

doi:10.3233/WEB-180387

Automatic Chinese character similarity measurement

Article type: Research Article

Authors: Liu, Ming^{a; *} | Rus, Vasile^b | Li, Yue^a | Sheng, Chuqian^a | Liu, Li^c

Affiliations: [a] The School of Computer and Information Science, Southwest University, China. E-mail: [email protected] | [b] Department of Computer Science, University of Memphis, Memphis, TN, USA. E-mail: [email protected] | [c] School of Software Engineering, Chongqing University, China. E-mail: [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected].

Abstract: Automatically identifying Chinese characters that are similar in their glyph, pronunciations and meaning are important for building smart question generation tools in a computer-assisted language-learning environment. Previous research on the Chinese character similarity measurement focused on character glyph (e.g. structures, strokes and radicals) with heuristic algorithms whose parameter have preset values. This article presents a machine learning (regression) approach to measure the similarity between two Chinese characters, based on the information which not only includes the glyph, but also pronunciation (pinyin) and semantic meaning derived from HowNet. We evaluated various regression models using a testing set consisting of 2586 pairs of characters selected from elementary Chinese textbooks used. The study results showed that four regression models (M5, Support Vector Machine, Gaussian Process and Linear Regression) have similar results (0.617⩽Mean Absolute Error⩽0.641, 0.772⩽Root Mean Square Error⩽0.790). In addition, the study implied that the performance of the regression model could be influenced by the character frequency. Moreover, we evaluated the regression model in a well-known Chinese language learning resource, called 100 pairs of the most confusing Chinese characters. The experiment results indicated that this approach has potential in the recognition and generation of confusing Chinese character pairs.

Keywords: Natural language processing, Chinese character similarity measurement, intelligent authoring tools

DOI: 10.3233/WEB-180387

Journal: Web Intelligence, vol. 16, no. 3, pp. 195-202, 2018

Published: 11 September 2018

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia