Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering

Alshammari, Asma Abdulkarim

Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering

dc.contributor.advisor	Bhatnagar, Raj
dc.contributor.author	Alshammari, Asma Abdulkarim
dc.date.accessioned	2023-07-20T07:42:47Z
dc.date.available	2023-07-20T07:42:47Z
dc.date.issued	2023
dc.description.abstract	Information retrieval is one of the common tasks in today’s world and retrieval systems are aided by various text mining and analysis methods. The objective of retrieval is to obtain information resources from a collection that are relevant to a specified query. The retrieval process begins with a query provided by a user. A search engine is then started to find the relevant resources. Typically, the queries are formed using the same terms (words) that also occur within the resources. The situations of a document matching the non-occurring terms are illustrated by the following examples: we want to retrieve documents relevant to some query terms that do not explicitly occur in the documents but are relevant to their contents. We want to retrieve documents using queries that contain labels from the ontology tree, and these labels may not explicitly occur in documents. We may have a large collection of documents in an organization, and various user communities that may want to refer to the documents using their community-specific ontologies. Several information retrieval methods use clustering of documents followed by determining signatures for each cluster describing the terms predominantly present in each of the clusters. We have designed and implemented a clustering algorithm that partitions the data space in a step-wise manner and seeks to optimize clusters that have good-quality signatures representing the documents in the clusters. The clustering algorithm is modeled on a bi-clustering strategy using the spectral co-clustering method at each step and then optimizing towards clusters that have strong representative signatures. We have shown that this clustering algorithm performs better than other known clustering algorithms such as K-Means and Latent Dirichlet Allocation (LDA). We have accomplished our goal of improving information retrieval systems’ capabilities and performance by presenting a new method to generate predicted terms for the documents by using Singular Value Decomposition (SVD) based collaborative filtering methods. We have shown that retrievals made using such recommended terms for documents retrieve correct documents with reasonably high accuracy. In addition, including predicted terms in the clustering process improves the purity of clusters and the quality of retrieval. We have achieved our goal of integrating ontological labels with information retrieval by adding terms to a document from ontologies and using a collaborative filtering approach to associate ontology labels with other relevant documents. We have tested the performance of our method with many cases of integrating ontologies: single ontology label, single large ontology with all complexities of an ontology tree, and multiple ontology trees. We have tested this method on our document collections and have obtained promising results. Our method has higher performance than other existing methods.
dc.format.extent	95
dc.identifier.uri	https://hdl.handle.net/20.500.14154/68667
dc.language.iso	en_US
dc.subject	Machine Learning
dc.subject	Data Mining
dc.subject	Document Clustering
dc.subject	Spectral Co-clusterng
dc.subject	Ontologies
dc.subject	Retrieve
dc.subject	Information Retrieval
dc.title	Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering
dc.type	Thesis
sdl.degree.department	College of Engineering and Applied Sciences/ Computer Science
sdl.degree.discipline	Machine Learning / Data Mining
sdl.degree.grantor	University of Cincinnati
sdl.degree.name	Doctor Of Philosophy

Collections

SACM - United States of America

Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering

Files

Collections