Classification of Biomedical Data Using Spatial Features
Abstract
Histopathologists typically collect biopsies which leads to image data. They examine the images to obtain various diagnostic summaries, e.g. proportion of tumor. They do this by overlaying a regular grid of points which are then classified. This classification allows them to estimate the proportion of tumor and other statistics. In this thesis, we focus on investigating heterogeneity. We do this by considering measures of clustering in
the classified points spatially. We consider the use of cluster statistics in the diagnosis of patient cancer (stomach and rectum cancers). We further consider tests of anisotropy/direction of heterogeneity/clustering. Binary Markov random field parameter estimation is also investigated as an alternative approach for detecting heterogeneity of the image both overall and in a specific direction. Furthermore, we consider spatial prediction and consistency of spot classifications for overlapping regions sampled at different resolutions.
In the first part of this thesis, we aim to identify an appropriate spatial autocorrelation statistic measure, under a normal approximation of the statistical test. We investigate the power of Moran's I statistic which has power in the large sample setting. More importantly, the I statistic is then modified to measure the heterogeneity/clustering in different directions. In particular in the cancer studies, associating the cluster direction with that of the lumen surface, which is an important pathological feature, is investigated.
Following this, a new simulation-based iterative method for estimating binary Markov random field parameters is explained. Estimated parameters give similar information to the spatial measurements, and this method leads to a statistical test which does not depend on normal approximations. Based on simulation, the accuracy of the iterative method is checked and compared favourably with an existing parameter estimation method.
We address the sampling issue by investigating the spatial consistency for pairs of images sampled from the same area but with different resolutions. Finally, we address several clinical questions. For instance, explaining the differences in survival of patients is investigated and it was found that heterogeneity is related to expected survival times.