Visual grounding of abstract and concrete words: A response to Günther et al. (2020)


Current computational models capturing words meaning mostly rely on textual corpora. While these approaches have been successful over the last decades, their lack of grounding in the real world is still an ongoing problem. In this paper, we focus on visual grounding of word embeddings and target two important questions. First, how can language benefit from vision in the process of visual grounding? And second, is there a link between visual grounding and abstract concepts? We investigate these questions by proposing a simple yet effective approach where language benefits from vision specifically with respect to the modeling of both concrete and abstract words. Our model aligns word embeddings with their corresponding visual representation without deteriorating the knowledge captured by textual distributional information. We apply our model to a behavioral experiment reported by Günther et al. (2020), which addresses the plausibility of having visual mental representations for abstract words. Our evaluation results show that (1) It is possible to predict human behaviour to a large degree using purely textual embeddings. (2) Our grounded embeddings model human behavior better compared to their textual counterparts. (3) Abstract concepts benefit from visual grounding implicitly through their connections to concrete concepts, rather than from having corresponding visual representations.

Hassan Shahmohammadi
Hassan Shahmohammadi
PhD candidate at University of Tuebingen & IMPRS-IS

My research interests include Multi-Modal learning using deep learning bridging NLP and CV