Exploring Malnutrition in Residential Aged Care: A Study on Nursing Notes using Natural Language Processing and Large Language Models
Date
2024-03-21
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wollongong
Abstract
Population ageing has led to an increasing demand for services for the older people.
Residential aged care facilities (RACFs) in Australia provide a range of services for older
people who can no longer live independently at home. These include accommodation,
personal care, health care services and social and emotional support. Despite efforts for
comprehensive care, managing nutrition for older people has been complex in RACFs.
Malnutrition has emerged as a prevalent issue within these facilities, raising serious health
concerns. Therefore, understanding and addressing malnutrition becomes a critical
concern for the Australian government. To date, there has been a reliance on nutrition
screening tools to assess older people’s nutritional care needs. Conducting these
assessments require adequate healthcare training, and is time consuming, thus are not
implemented as frequently as needed to timely uncover the risk of malnutrition for older
people.
In Australia, the majority of RACFs have established electronic health record
(EHRs) system to capture and record care recipients’ information. These include medical
diagnosis, regular nursing assessment, weight chart, care plan, periodic review, incident
and infection review, and nursing progress report. Therefore, RAC EHRs contain wealth
of information that can be mined to support aged care services.
The advancement in natural language processing (NLP) technologies, in specific,
large language models (LLMs), provides an opportunity to uncover useful insight from
the RAC EHRs. Therefore, this PhD research is dedicated to extend NLP technology to
the under-studied area RAC, design, implement and evaluate LLM applications in
nutrition management among older individuals living in RACFs. It aims to design and
develop a sophisticated machine learning framework capable of analysing both structured
and unstructured EHR data to gain comprehensive insights into the malnutrition issue.
Drawing from literature insights, the study initiates by employing word embedding
techniques integrating with cosine similarity and UMLS ontology to extract nutrition-
related terms from nursing notes in RACFs. This led to the uncover of language style and
terminology used by the practicing nursing and aged care workers in manage nutrition for
the older people under their care. Subsequent development of 13 extraction rules
identifies relevant notes indicative of malnutrition, forming the basis for a training data
set of 2,278 relevant nursing notes, which is utilized in LLM implementation.
To enhance the LLM understanding of nursing notes, we randomly selected
500,000 notes for pre-training a domain specific LLM based on the established RoBERTa
model. This is followed by fine-tuning the LLM specifically for malnutrition note
detection. Achieving an impressive F1-score of 0.96, our model significantly surpassed
previous models, ensuring more accurate classification of notes documenting
malnutrition.
Furthermore, we developed a framework integrating generative LLM, Llama 2, and
retrieval augmented generation (RAG) system to extract comprehensive summary
information from malnutrition-related notes. This framework demonstrates high accuracy
(90%) in identifying malnutrition risk factors from 1,399 notes. It generates detailed
summaries about nutrition status from EHRs with 99% of accuracy.
Our study reveals a malnutrition prevalence rate of approximately 33% in the
studied RACFs. There are 15 main categories and 43 subcategories of malnutrition risk
factors. For the first time, this research identified the primary risk factors of malnutrition
in RACFs, including poor appetite that affects 17% of older people. This is followed by
insufficient oral intake and dementia progression.
To enhance malnutrition predictive capabilities, we fine-tuned the RAC domain
specific model to address the sequence length limitation of the RoBERTa model, 512
tokens. This is achieved by extending the sequence length to support 1,536 tokens.
Augmented with risk factors, our model achieved an F1-score of 0.687, demonstrating its
effectiveness in predicting malnutrition risk one month before the event onset.
In conclusion, this research designs, develops and evaluates an innovative AI
framework that leverages advanced AI technologies, particularly NLP and domain-
specific LLMs, to tackle malnutrition among older people in residential aged care
facilities. By analysing text data in EHR, The AI framework identifies risk factors,
summarises nutrition information, and predict malnutrition one-month before the event
onset. After thorough evaluation by domain experts, the AI framework can be
implemented as an automated assessment tool. Its implementation into aged care services
will alleviate the time burden associated with nutrition care for health and aged care
practitioners, supporting them in identifying risk factors of malnutrition for the old people
under their care, and manage malnutrition efficiently. The framework’s scalability
extends beyond residential aged care facilities. It can be further extended to other
healthcare settings to improve nutrition care effectiveness and quality of life for
consumers.
Description
Keywords
Machine learning, large language models, Llama, RoBERTa, BERT, Retrieval-augmented generation (RAG), Health informatics, Malnutrition