Affiliations: [a] Fondazione Bruno Kessler, Via Sommarive, Trento, Italy
Department of Information Engineering and Computer Science, University of Trento, Via Sommarive, Trento, Italy
Corresponding author: Ivan Donadello. Tel.: +390461314084; E-mail: firstname.lastname@example.org.
Abstract: Semantic image interpretation (SII) is the process of generating meaningful descriptions of the content of images. Background knowledge (BK), in the form of logical theories, is extremely useful for SII. State-of-the-art algorithms for SII mainly adopt a bottom-up approach, which generates semantic interpretations of images starting from their low-level features. In these approaches BK is used only at a late stage for both enriching the semantic descriptions and improving image retrieval. In this paper, we show how BK plays an important role also during the early phase of SII. To this aim, we propose: (i) a reference framework where a semantic image description is a partial model of the BK. The elements of the partial model are grounded (linked) to a (set of) image segment(s). (ii) A loss function that evaluates how well this partial model fits the picture; (iii) a clustering-based optimization process that searches the partial model that better fits a picture. BK is used to prune branches of the search space that correspond to partial models which are inconsistent with BK. To evaluate our approach, we built a gold standard dataset of 203 pictures annotated with complex objects and their parts. We also evaluated our method on a reference dataset in Computer Vision, namely, the PASCAL-Part dataset. The results are positive. The evaluation assumes a perfect detection of parts. To understand the impact of a realistic (and noisy) part detection on our algorithm, we did a preliminary evaluation by implementing the entire SII pipeline. Part detection is performed by a recent deep learning architecture trained for detecting parts. From a qualitative analysis, it emerges that recognizing complex objects starting from parts in some cases gets better results than detecting complex objects directly.
Keywords: Information extraction, computer vision, semantic image interpretation, ontologies, clustering