Unsupervised abstraction for reducing the complexity of healthcare process models

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Healthcare processes are complex and may vary considerably among the same cohort of patients. Process mining techniques play a significant role in automating the construction of healthcare models using a system’s event log. An event log is a data type that records any event that occurs within the process. It is a basic element of any information system and has three main components: process instance id, event and time when an event has occurred. Using ordinary techniques of process mining in healthcare produces ‘spaghetti-like’ models which are difficult to understand and thus have little value. Previously published studies have highlighted the importance of event abstraction which is considered as a central tool for reducing complexity and improving efficiency. Although studies have successfully improved the understandability of process models, they have generally relied on involvement from a domain expert. Untangling these ‘spaghetti-like’ models with the help of domain experts can be expensive and time-consuming. Machine learning techniques such as Hidden Markov Model (HMM) has been used for modelling sequential data for a long time. State transition modelling has also been explored by process mining research and is advocated for sequence clustering purposes where a model is trained over a group of sequences and then used to evaluate if a process instance is more likely to be generated from this model or not. However, state transition models can also be utilised for detecting hidden processes which can be used subsequently for process abstraction. In this thesis, we aim to address healthcare process complexity using unsupervised abstraction. We adopt an unsupervised method for detecting hidden processes using HMM and the Viterbi algorithm. The method in this research includes eight stages; event logs extraction, preprocessing, learning, decoding, optimisation, selection, visualization and lastly model evaluation. One of the main contributions of this research is the design of two different types of process model optimisation which are strict and soft optimisations. Models that are selected by the proposed optimisation address the limitations of other standard metrics that can be used for model selection in HMM such as Bayesian Information criteria (BIC). Two different real healthcare data sources are used in this research namely the Medical Information Mart for Intensive Care (MIMIC-III) from Boston, USA and the Patients Pathway Manager (PPM) from Leeds, UK. Models are trained using the MIMIC-III medical event log and then tested using the PPM dataset to be evaluated later by a domain expert. Three breast cancer case studies that range in complexity are extracted. The results of our method have significantly improved model complexity and provided a conceptually valid abstraction for several care patterns. Promising results are demonstrated in the improvement of the precision and fitness of the abstracted models. The abstracted models can then be used as a middle step for bringing structure to unstructured processes which helps in finding cohorts of patients based on similar healthcare processes. The healthcare processes of a cohort of patients can then be modelled using any process mining tool where their process similarity could not be captured in the complex models.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025