Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering

Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Information retrieval is one of the common tasks in today’s world and retrieval systems are aided by various text mining and analysis methods. The objective of retrieval is to obtain information resources from a collection that are relevant to a specified query. The retrieval process begins with a query provided by a user. A search engine is then started to find the relevant resources. Typically, the queries are formed using the same terms (words) that also occur within the resources. The situations of a document matching the non-occurring terms are illustrated by the following examples: we want to retrieve documents relevant to some query terms that do not explicitly occur in the documents but are relevant to their contents. We want to retrieve documents using queries that contain labels from the ontology tree, and these labels may not explicitly occur in documents. We may have a large collection of documents in an organization, and various user communities that may want to refer to the documents using their community-specific ontologies. Several information retrieval methods use clustering of documents followed by determining signatures for each cluster describing the terms predominantly present in each of the clusters. We have designed and implemented a clustering algorithm that partitions the data space in a step-wise manner and seeks to optimize clusters that have good-quality signatures representing the documents in the clusters. The clustering algorithm is modeled on a bi-clustering strategy using the spectral co-clustering method at each step and then optimizing towards clusters that have strong representative signatures. We have shown that this clustering algorithm performs better than other known clustering algorithms such as K-Means and Latent Dirichlet Allocation (LDA). We have accomplished our goal of improving information retrieval systems’ capabilities and performance by presenting a new method to generate predicted terms for the documents by using Singular Value Decomposition (SVD) based collaborative filtering methods. We have shown that retrievals made using such recommended terms for documents retrieve correct documents with reasonably high accuracy. In addition, including predicted terms in the clustering process improves the purity of clusters and the quality of retrieval. We have achieved our goal of integrating ontological labels with information retrieval by adding terms to a document from ontologies and using a collaborative filtering approach to associate ontology labels with other relevant documents. We have tested the performance of our method with many cases of integrating ontologies: single ontology label, single large ontology with all complexities of an ontology tree, and multiple ontology trees. We have tested this method on our document collections and have obtained promising results. Our method has higher performance than other existing methods.

Description

Keywords

Machine Learning, Data Mining, Document Clustering, Spectral Co-clusterng, Ontologies, Retrieve, Information Retrieval

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025