Retrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering

dc.contributor.advisorBhatnagar, Raj
dc.contributor.authorAlshammari, Asma Abdulkarim
dc.date.accessioned2023-07-20T07:42:47Z
dc.date.available2023-07-20T07:42:47Z
dc.date.issued2023
dc.description.abstractInformation retrieval is one of the common tasks in today’s world and retrieval systems are aided by various text mining and analysis methods. The objective of retrieval is to obtain information resources from a collection that are relevant to a specified query. The retrieval process begins with a query provided by a user. A search engine is then started to find the relevant resources. Typically, the queries are formed using the same terms (words) that also occur within the resources. The situations of a document matching the non-occurring terms are illustrated by the following examples: we want to retrieve documents relevant to some query terms that do not explicitly occur in the documents but are relevant to their contents. We want to retrieve documents using queries that contain labels from the ontology tree, and these labels may not explicitly occur in documents. We may have a large collection of documents in an organization, and various user communities that may want to refer to the documents using their community-specific ontologies. Several information retrieval methods use clustering of documents followed by determining signatures for each cluster describing the terms predominantly present in each of the clusters. We have designed and implemented a clustering algorithm that partitions the data space in a step-wise manner and seeks to optimize clusters that have good-quality signatures representing the documents in the clusters. The clustering algorithm is modeled on a bi-clustering strategy using the spectral co-clustering method at each step and then optimizing towards clusters that have strong representative signatures. We have shown that this clustering algorithm performs better than other known clustering algorithms such as K-Means and Latent Dirichlet Allocation (LDA). We have accomplished our goal of improving information retrieval systems’ capabilities and performance by presenting a new method to generate predicted terms for the documents by using Singular Value Decomposition (SVD) based collaborative filtering methods. We have shown that retrievals made using such recommended terms for documents retrieve correct documents with reasonably high accuracy. In addition, including predicted terms in the clustering process improves the purity of clusters and the quality of retrieval. We have achieved our goal of integrating ontological labels with information retrieval by adding terms to a document from ontologies and using a collaborative filtering approach to associate ontology labels with other relevant documents. We have tested the performance of our method with many cases of integrating ontologies: single ontology label, single large ontology with all complexities of an ontology tree, and multiple ontology trees. We have tested this method on our document collections and have obtained promising results. Our method has higher performance than other existing methods.
dc.format.extent95
dc.identifier.urihttps://hdl.handle.net/20.500.14154/68667
dc.language.isoen_US
dc.subjectMachine Learning
dc.subjectData Mining
dc.subjectDocument Clustering
dc.subjectSpectral Co-clusterng
dc.subjectOntologies
dc.subjectRetrieve
dc.subjectInformation Retrieval
dc.titleRetrieval and Labeling of Documents Using Ontologies: Aided by a Collaborative Filtering
dc.typeThesis
sdl.degree.departmentCollege of Engineering and Applied Sciences/ Computer Science
sdl.degree.disciplineMachine Learning / Data Mining
sdl.degree.grantorUniversity of Cincinnati
sdl.degree.nameDoctor Of Philosophy

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025