Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 1 of 1
  • ItemRestricted
    Beyond Sight: Generating Clothing Graphic Captions for Visually Impaired Users
    (university of sheffield, 2024) Alluqmani, Amnah; Harvey, Morgan
    People with visual impairment (VI) often face challenges when performing everyday tasks, of which clothes shopping is the most challenging. Many people with VI engage in online shopping to eliminate some of the challenges associated with physical shopping. However, buying clothes online presents many other limitations and barriers. More research is needed to address these challenges and propose solutions to enhance the shopping experiences of people with VI. Most research studies have relied on interviews alone, which provide only subjective, recall-biased information, and only few studies have offered a solution. Thus, we conducted two complementary studies, using both observational and interview approaches to fill the gap in understanding the behaviours of people with VI when selecting and purchasing clothes online. In addition, we built a VI-friendly clothing graphic captioning model, adopting the LLM approach to simulate the needs of people with VI and the ROI approach to generate focused, informative captions. We conducted multiple assessments, including a manual analysis of GT clothing image-text datasets and automatic and human evaluations of our proposed model. Our findings show that shopping websites have inaccurate, misleading and contradictory clothing descriptions and that people with VI rely mainly on (unreliable) search tools and check product descriptions by reviewing customer comments. Our findings also indicate that people with VI are hesitant to accept assistance from automated systems. Trust in such systems could be improved if researchers develop systems that better accommodate users' needs and preferences. The results of the evaluation of the GT captions showed that they had several limitations that disqualify them from consideration as gold standard captions. The evaluation of our model revealed that adopting LLMs and ROI approach generates captions that are the most informative, considering the needs of people with VI, compared with BLIP-2 (i.e., adopting a grid-based approach) and GT captions. Our model achieved average scores of 1.73 and 1.774 from evaluators with VI and sighted evaluators, respectively. In comparison, BLIP-2 and GT data obtained average scores of 1.460 and 1.440 from evaluators with VI and 1.565 and 1.671 from sighted evaluators, respectively. Furthermore, although our model was the most informative among the existing baseline models, it is difficult to maintain its accuracy, which is lower than those of the GT and BLIP-2 models. This thesis investigates the factors that affect the accuracy of captions and will be further addressed in future work.
    25 0

Copyright owned by the Saudi Digital Library (SDL) © 2025