Non-Parameterized Sentence Embeddings

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

The field of Natural Language Processing (NLP) has progressed rapidly in recent years due to the evolution of deep neural models and their constituent, word embeddings. While word embeddings are often used as a primary component, the ultimate goal of any NLP system is to capture the underlying linguistic characteristics of word sequences, including phrases, sentences, or paragraphs, that are ideal for an end task. To generate these kinds of embeddings, most NLP models rely on a mathematical operation, such as averaging or other pooling mechanisms, over smaller units such as words, morphemes, or even characters. However, these representations tend to be task-specific and typically perform poorly when transferred to other tasks. Consequently, different models have been proposed to generate general-purpose sentence embeddings to be used in a pretraining kind of protocol. The most notable differences between these models are the efficiency and performance trade-offs. However, to date, most proposed embedding models suffer from indifference toward underlying syntactic and semantic characteristics of the text. To this end, in this dissertation, we develop an efficient sentence embedding model that is capable of capturing both syntactic and semantic properties. First, we use Discrete Cosine Transform (DCT) to compress word sequences in an order-preserving manner. The lower-order DCT coefficients represent the overall feature patterns in sentences, which result in suitable embeddings for tasks that could benefit from syntactic features. Our results in semantic probing tasks demonstrate that DCT embeddings indeed preserve more syntactic information compared with most commonly used approach, vector averaging. With practically equivalent complexity, the DCT model yields better overall performance in downstream classification tasks that correlate with syntactic features. This illustrates the capacity of DCT to preserve word-order information. We further validate the efficiency of our DCT embedding in multi- and cross-lingual settings. Specifically, we investigate the generality of the representations across different languages that exhibit different linguistic properties as a language-independent model and a cross-lingual model. We empirically show that the performance of the DCT embeddings is comparable across different languages for all examined tasks. Moreover, in the cross-lingual setting, DCT embeddings resulted in superior performance in sentence translation retrieval compared to the other state-of-the-art models across all language pairs. These results reaffirmed the power of the structural properties encoded on the lower-order DCT coefficients which are used to generate the final fixed-length sentence A major weakness of DCT, however, is loss of linguistic information, e.g. "man bitten by dog" embeddings rendered from the lower-order DCT coefficients are more similar to "man bites dog" embeddings than the semantically similar example "dog bites man". To address this deficiency, we propose to explicitly model linguistic information in the DCT framework using a block based representation protocol. The blocks reflect various levels of linguistic representation such as ngram chunk, syntactic dependencies and shallow semantic representations. Overall, our results show that augmenting the DCT encoding with Block-based representations improves performance relative to the vanilla baseline (sentence only encoding) for both probing and downstream classification tasks.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025