Bayesian Methods for Inference in Biostatistical Longitudinal Studies and Modelling of Missing Data
No Thumbnail Available
Date
2024-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Glasgow
Abstract
Longitudinal studies repeatedly collect data from the same individuals over
time to study long-term factors. A commonly used model in longitudinal
studies is the linear mixed effects model, which considers the correlation between
observations within individuals. There are two ways to fit the model in
statistical fields: the Frequentist and Bayesian approaches. The Frequentist
approach is widely used, while the Bayesian approach has become more common
with computational advancements. The work in this thesis comprises a
comparison study between the Frequentist linear mixed effects model and the
Bayesian Hierarchical model, using simulated longitudinal data and data from
a heart failure study (BIOSTAT-CHF). It was observed that inferences from
both approaches were similar. However, the Bayesian approach offers an advantage
by providing a probability distribution for the parameter estimates.
This shows the probability of values falling within a certain range and incorporates
prior information from previous studies into the inference.
In longitudinal studies, missing data is a common problem that can impact
the statistical analysis estimates by producing biased estimates. A method
that deals with non-ignorable missingness in the response using Correlated
Random Effects (CRE) based on latent variables and Gibbs sampling has
been proposed in the literature and has performed well in scenarios assuming
semi-parametric modelling. However, when applied to linear mixed-effect
modelling, the covariance matrix parameters had difficulty converging. To
address this issue, the work in this thesis considers a weakly informative prior
using the Inverse Wishart distribution.
Additionally, this CRE method is unable to accommodate incomplete data in the analysis model explanatory variables.
To address this problem, the work in this thesis proposed three methods
to deal with missingness in the response and explanatory variables by adapting
the CRE method.
Two proposed methods, the Two-Step and the GCRE-MAR methods, were
designed to address non-ignorable missingness in the model response and ignorable
missingness in the model explanatory variables. The GCRE-MNAR
method was designed for non-ignorable missingness in both the model response
and explanatory variables. In the Two-Step method, the CRE method
was adapted by incorporating an additional step using the MICE algorithm, a
common approach for handling MAR data and producing imputed datasets.
The CRE method is then applied to the imputed MICE datasets.
The GCRE-MAR and GCRE-MNAR represent generalised versions of the
CRE method. The GCRE-MAR method incorporates the incomplete explanatory
variable model. The GCRE-MNAR method incorporates the incomplete
explanatory variable model and the incomplete explanatory variable missingness
process model. It considers correlated random effects between the incomplete
explanatory variable model and the missingness process.
The proposed methods were compared with the CRE method and some baseline
models using simulated longitudinal data for different numbers of repeated
measures and missing proportion factors. The proposed methods perform
similarly to the CRE method, given that the proposed methods consider
missing data in both the response and explanatory variables. In contrast, the
CRE method only has missing data in the response (no missing values are in
the explanatory variables). Furthermore, the proposed methods outperform
the available data method in out-of-sample predictive performance, and the
parameter estimates closely match the parameters that generated the data.
Additionally, the proposed methods were applied to the BIOSTAT-CHF data,
and the results were consistent regardless of the applied method. The correlated
random effects indicated that the NT-proBNP missingness was MAR,
and the eGFR missingness was MNAR. Finally, the sensitivity analysis for the
misspecified missingness mechanism for the proposed methods had a small
impact on the overall results, whereas the misspecified response missingness
model resulted in biased parameter estimates for some of the analysis model
coefficients.
Description
Keywords
Longitudinal studies, missing data, linear mixed effects model, Bayesian