Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Jeon, Sung Hwana | Lee, Hye Jina | Park, Jihyea | Cho, Sungzoona; b; *
Affiliations: [a] Department of Industrial Engineering, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, Korea | [b] Institute for Industrial Systems Innovation, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, Korea
Correspondence: [*] Corresponding author: Sungzoon Cho, Department of Industrial Engineering and Institute for Industrial Systems Innovation, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, Korea. E-mail: [email protected].
Abstract: Attempts to express information from various documents in graph form are rapidly increasing. The speed and volume in which these documents are being generated call for an automated process, based on machine learning techniques, for cost-effective and timely analysis. Past studies responded to such needs by building knowledge graphs or technology trees from the bibliographic information of documents, or by relying on text mining techniques in order to extract keywords and/or phrases. While these approaches provide an intuitive glance into the technological hotspots or the key features of the select field, there still is room for improvement, especially in terms of recognizing the same entities appearing in different forms so as to interconnect closely related technological concepts properly. In this paper, we propose to build a patent knowledge network using the United States Patent and Trademark Office (USPTO) patent filings for the semiconductor device sector by fine-tuning Huggingface’s named entity recognition (NER) model with our novel edge weight updating neural network. For the named entity normalization, we employ edge weight updating neural network with positive and negative candidates that are chosen by substring matching techniques. Experiment results show that our proposed approach performs very competitively against the conventional keyword extraction models frequently employed in patent analysis, especially for the named entity normalization (NEN) and document retrieval tasks. By grouping entities with named entity normalization model, the resulting knowledge graph achieves higher scores in retrieval tasks. We also show that our model is robust to the out-of-vocabulary problem by employing the fine-tuned BERT NER model.
Keywords: Knowledge graph, named entity normalization, information extraction, keyword extraction
DOI: 10.3233/IDA-227129
Journal: Intelligent Data Analysis, vol. 28, no. 1, pp. 331-355, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]