A Word Embeddings Approach to Predicting the Compositionality of Idiomatic Expressions

RAWAN MOHAMMED BIN SHIHA

A Word Embeddings Approach to Predicting the Compositionality of Idiomatic Expressions

Authors

RAWAN MOHAMMED BIN SHIHA

Publisher

Saudi Digital Library

Abstract

A significant part of each natural language consists of Multiword Expressions (MWEs), which must be handled appropriately for multiple Natural Language Processing (NLP) applications. WordNet 1.7, one of the largest lexical databases of English language, contains 41% MWEs records (Fellbaum, 1998). Idiomatic Expression is a type of MWEs that can be defined as a phrase with idiosyncrasies that cannot be acquired from its word components. Sag et al. (2002) claim that the non- compositionality feature of idiomatic expressions presents problems for various NLP tasks, as basic grammatical rules cannot be used to identify these expressions directly. Various NLP researches have used word embeddings models as a statistical method for evaluating the compositionality of MWEs since the vector space of these models can capture the meanings of words, since comparable vectors indicate words with similar contexts. Therefore, in this project, the term embeddings models will be used as a semantic measure for predicting the compositionality of MWEs that are semantically non-compositional (i.e., idiomatic expressions) in a corpus. Two word embeddings models, Word2vec and Context2vec, were trained to predict the compositionality of idiomatic expressions. After training the models, their automatic MWEs compositionality scores were compared. Then an intrinsic evaluation was carried out to evaluate the performance of Word2vec and Context2vec against an MWEs dataset evaluated by human annotators. Our findings show that Word2vec models outperform Contex2vec for this project's task. Word2vec achieved compositionality scores in line with human annotators' judgement and consumed less training time than Contex2vec.

URI

https://drepo.sdl.edu.sa/handle/20.500.14154/66273

Collections

SACM - United Kingdom

Full item page

A Word Embeddings Approach to Predicting the Compositionality of Idiomatic Expressions

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By