Evaluating and Fine-Tuning Large Language Model-Powered Mental Health Chatbots in Arabic
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Birmingham
Abstract
Recent advancements in AI tools have revolutionized the health sector
in patient assessment, appointments and follow-ups. Furthermore, their
role shines in evaluating and providing mental health support. Despite
the numerous Mental health chatbots in English, mental health issues
remain a challenging subject, especially among Arabic speakers, where
there are little to no current effective chatbots. This project evaluates
and fine-tunes existing large language models (LLM) to help provide
accurate mental health counselling to Arabic speakers. It utilized a total
of 6917 question-answer pairs collected from the CounselChat
platform covering various common mental health topics that were used
later in fine-tuning BLOOMz 3b and Llama2 7b LLMs. We found out
that both models, in terms of statistical metrics, perform very poorly.
However, model-based metrics showed good results. BLOOMz shows
a promising result that reflects the model's ability to construct coherent,
clear and direct answers when inference testing was done. With more
careful and accurate data curation and utilizing the LLM-based
evaluation framework, both BLOOMz and Llama2 can be implemented
to develop real-world applications of mental health chatbots that are
able to provide accurate mental health counselling to Arabic speakers.
Description
Keywords
Mental health chatbots, Large Language models, Arabic LLMs, Arabic chatbots, Llama2, BLOOMz.