Automatic segmentation of the left ventricle in 3-D echocardiography images
Abstract
Objectives: This research was aimed at applying pure end-to-end deep learning segmentation
techniques using a three-dimensional (3D) U-net and V-net architecture to segment 3D
echocardiography left ventricle (LV) images.
Materials and Methods: V-net and U-net were used to segment 3D echocardiography LV images
using PyTorch. Training of these two models was carried out on end systolic (ES) and end diastolic
(ED) images, using a free online graphics processing unit (GPU) called Google Colab. First, ITK-SNAP
was used to segment 72 LV myocardium and blood pool in ES frame in addition to 5 LV blood pool
segmentation in ED frame. For refining the segmented images and filling the volume of LV, a
biventricular cardiac atlas developed by the researchers at Imperial College London was used. Second,
these images were converted to NumPy arrays for data processing and preparation. Three pre-processing
tasks were performed: (1) intensity normalization, (2) image cropping, and (3) image downsampling.
Data augmentation was performed in order to get a large-scale dataset. Intensity transformation using
histogram equalization and affine spatial transformation were applied. Then, V-net and U-net were used
to train the datasets using five cross-validation techniques to validate the performance of the model
during the training. The datasets were split into 48 subjects for training and 16 subjects for validation,
and the cross-entropy loss and dice similarity coefficient were used to track the performance of the
model during the training. Several trials were performed to optimize each model separately in order to
obtain the best segmentation results. These trials involved changing the learning rate, changing the
optimizer, changing the number of epochs, changing the number of filters, enabling data augmentation,
and finally applying the combining ED and ES images by adding dropout. The best of each model was
used to evaluate unseen test images, which included five pathological cases (cardiac hypotrophy) in both
ED and ES frames.
Results: The results of changing the learning rate and optimizers showed that the performance of
different architectures is different using the same parameters. U-net performance is better when using
the SGD optimizer, with a learning rate of 0.01, contrary to the findings of V-net performance, which
was worse than U-net performance with the same optimizer and learning rate. V-net showed a
substantially better performance when the optimizer was changed to Adam, with a learning rate of 0.001,
as suggested by V-net implementer. The performance of the model when applying data augmentation
and adding ES images with dropout did not show improvement as was expected. The performance of
the best-chosen models “fold 5” for U-net and “fold 3” for V-net was good when five unseen data sets
were evaluated, which reflected a similar performance of the models during the training. The results of
both models were very close to each other: In U-net, it produced 0.6552, 0.5402, and 0.7701 for the
average dice similarity coefficient, myocardium dice similarity coefficient, and blood pool dice
similarity coefficient, respectively. In V-net, it produced nearly 0.6522, 0.5328, and 0.7716 for the
average dice similarity coefficient, myocardium dice similarity coefficient, and blood pool dice
similarity coefficient, respectively.
Conclusion: Our findings showed that we still have a long way to reach the optimal segmentation
accuracy. Moreover, because of the time limitation and limited GPU available, our trials were not
sufficient to obtain the best accuracy results, and future work should carefully consider the potential
effects of model weight initialization methods, manual segmentation accuracy of ES images
(particularly myocardium), data augmentation technique and interpolation methods, and run-to-run
var