Abstract: Consumers today have the option to purchase products from thousands of e-shops. However, the completeness of the product specifications and the taxonomies used for organizing the products differ across different e-shops. To improve the consumer experience, e.g., by allowing for easily comparing offers by different vendors, approaches for product integration on the Web are needed. In this paper, we present an approach that leverages neural language models and deep learning techniques in combination with standard classification approaches for product matching and categorization. In our approach we use structured product data as supervision for training feature extraction models able to extract attribute-value pairs from textual product descriptions. To minimize the need for lots of data for supervision, we use neural language models to produce word embeddings from large quantities of publicly available product data marked up with Microdata, which boost the performance of the feature extraction model, thus leading to better product matching and categorization performances. Furthermore, we use a deep Convolutional Neural Network to produce image embeddings from product images, which further improve the results on both tasks.
Keywords: Product data, data integration, vector space embeddings, deep learning, microdata, schema.org
Journal: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-22, 2018