Landmarks Retrieval Using Deep Learning

Alturki, Reef

Landmarks Retrieval Using Deep Learning

Date

2023-11-14

Authors

Alturki, Reef

Publisher

Saudi Digital Library

Abstract

In today’s digital age, the complexity and diversity of multimedia data is increasing rapidly. This leads to an urgent need to design highly performing image retrieval systems to meet human demands. Retrieving similar images efficiently and effectively remains difficult due to several reasons, including variations in object appearance, partial occlusions and changes in viewpoints and scale. Image retrieval has gained research attention due to its contribution to a variety of applications, such as mobile commerce, tourism, surveillance and robotics. Although extensive studies have been conducted to enhance the performance of image retrieval systems, these studies are yet to achieve the desired outcomes. Furthermore, most of the studies concentrated on learning either local or global feature representations to handle the retrieval problem. This project aims to tackle the retrieval problem by utilizing a range of advanced techniques to develop and analyze two models. In the first model, the focus was to design a lightweight yet high performing architecture through the utilization of EfficientNet and ResNet as base CNNs, in addition to a fusion component that integrates both local and global features after their extraction, resulting in a single descriptor that well describes the image. Conversely, the second model focuses on learning deep local feature representation and aggregation through the use of spatial attention, self-attention, and cross-attention. To train these models, we utilized the Google Landmarks v2 dataset, which is currently the largest dataset available for image retrieval. Furthermore, we used ROxford and RParis datasets as benchmarks to evaluate our models’ effectiveness. Our thorough analysis involved assessing the models against state-of-the-art models, and the evaluation results showed their promising performance compared to existing solutions. Furthermore, this study gains deeper insights by including Grad-CAM visualizations, offering a clear glimpse into how the model makes decisions and revealing the areas it focuses on, thereby enhancing the model’s interpretability. Finally, conclusions and potential areas for future research are pointed out.

Keywords

Image retrieval, computer vision

URI

https://hdl.handle.net/20.500.14154/69782

Collections

SACM - United Kingdom

Full item page

Landmarks Retrieval Using Deep Learning

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By