Landmarks Retrieval Using Deep Learning
Date
2023-11-14
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
In today’s digital age, the complexity and diversity of multimedia data is increasing rapidly. This leads to an urgent need to design highly performing image retrieval systems to meet human demands. Retrieving similar images efficiently and effectively remains difficult due to several reasons, including variations in object appearance, partial occlusions and changes in viewpoints and scale. Image retrieval has gained research attention due to its contribution to a variety of applications, such as mobile commerce, tourism, surveillance and robotics. Although extensive studies have been conducted to enhance the performance of image retrieval systems, these studies are yet to achieve the desired outcomes. Furthermore, most of the studies concentrated on learning either local or global feature representations to handle the retrieval problem.
This project aims to tackle the retrieval problem by utilizing a range of advanced techniques to develop and analyze two models. In the first model, the focus was to design a lightweight yet high performing architecture through the utilization of EfficientNet and ResNet as base CNNs, in addition to a fusion component that integrates both local and global features after their extraction, resulting in a single descriptor that well describes the image. Conversely, the second model focuses on learning deep local feature representation and aggregation through the use of spatial attention, self-attention, and cross-attention. To train these models, we utilized the Google Landmarks v2 dataset, which is currently the largest dataset available for image retrieval. Furthermore, we used ROxford and RParis datasets as benchmarks to evaluate our models’ effectiveness. Our thorough analysis involved assessing the models against state-of-the-art models, and the evaluation results showed their promising performance compared to existing solutions. Furthermore, this study gains deeper insights by including Grad-CAM visualizations, offering a clear glimpse into how the model makes decisions and revealing the areas it focuses on, thereby enhancing the model’s interpretability. Finally, conclusions and potential areas for future research are pointed out.
Description
Keywords
Image retrieval, computer vision