Optimization of a Convolutional Neural Network for Determination of Particle Size and Shape Distribution from Inline Imaging in Crystallization Processes for the Pharmaceutical Manufacturing Industry
Abstract
The pharmaceutical manufacturing industry relies heavily on the batch mode of production. It also relies heavily on the final product’s quality control and testing, with the particle size distribution of particular importance, which can only be measured after production completion. However, this approach has many disadvantages such as high expenses, slow production, and large amounts of waste should the batch not meet the quality control requirements, and hence, the continuous manufacturing approach of pharmaceuticals has been sought after.
Offline measurements of the particle size distribution require withdrawing a sample from the reactor, however, the requirement of dilution of the sample, and temperature changes that occur to it when it is taken, ultimately change its properties, giving readings unrepresentative of the real properties. Therefore, inline measurement methods are preferred, and they are taken using various methods, such as chord length distribution using laser probe, or inline microscopy, However, they all have some drawbacks, especially in analysis of non-spherical particles or solutions of high concentrations.
Using inline particle imaging when paired with accurate image analysis can theoretically produce highly accurate results. Yet, classical image analysis approaches fail to accomplish highly accurate results, and are not flexible and difficult to tune, and so, machine learning models were sought after to achieve the required accuracy. A machine learning model that detects the particle size distributions of polystyrene and lactose dataset has been previously developed, and it produced fairly accurate results even when trained with only a handful of manually annotated images.
This project aims to optimize a novel model named stacks2psd, which takes particle images as input and generates a particle size distribution as output, to produce more accurate predictions that are very close or matching to the ground truth.
This is achieved by utilizing the polystyrene spheres subset dataset (PolyS_subset) and changing a set of hyperparameters systematically and comparing the results to determine which values produce the best model. ResNet is a popular and powerful model often utilized in various computer vision tasks. It mitigates the vanishing gradient problem by adding the output from a previous layer to a later layer. The hyperparameters chosen to be altered were the number of ResNet layers (n), the number of images in a stack (stack), the size of a batch of images(batch_size), and the maximum learning rate (max_lr).
The default values of these hyperparameters were n=18, stack=5, batch_size=10, and max_lr=1E-02. In addition, the effect of changing the seed manually on the performance of the model was investigated, by repeating each experiment 3 more times, with different seed values. The results were obtained through comparing the models by the following metrics: the cosine similarity, the Pearson correlation coefficient, the spearman correlation coefficient, the RMSE and the R2 Score. Other than the RMSE metric, which should be close to zero, all the other metrics should be close to 1.
Using this knowledge, the value of the hyperparameter corresponding to the minimum RMSE and Maximum of the other four was obtained, and the mode function was used to determine which value had the highest frequency among all the best metrics. The results show that the best optimal set of hyperparameters were n=34, stack=5, batch_size = 5,10 and max_lr=1E-02. In addition, it was concluded that variation of the seed value had overall little to no effect, and the resulting outliers stem from the need for longer model training for the values to converge.
Also, the optimal set of hyperparameters were applied on the entire polystyrene spheres dataset (PolyS) and the polystyrene ellipsoids and spheres mixture dataset (PolyE), and the result were highly accurate in both cases, with the RMSE metrics having values of 1.2E-04 and 6.2E-05 for PolyS and PolyE respectively, spearman correlation coefficients of 0.25 and 0.77 for PolyS and PolyE respectively, and all the other remaining metrics, the cosine similarity, the Pearson correlation coefficient and the R2 Score having values in the range of 0.94 and 1.00, despite training for only a hundred epochs.
Moreover, the effect of the particle concentration on the performance of the model was investigated, by obtaining the metrics by class for the polystyrene spheres (PolyS) dataset. The RMSE metric was used for the comparison, and the results indicated that the larger the particle size, the less the error, and the higher the concentration, the less the error. Although this might seem surprising because more noise is expected at higher concentrations, a possible explanation is that at higher concentrations, images wield more information about the particles and thus linking the images to the particle size distribution was facilitated. Similarly for lower concentrations, less information about the particles is present, which impeded the establishment of the link to the particle size distribution.
In addition, the Grokking effect was investigated, which occurs when neural networks after overfitting for a long time start learning again. The investigation was done by running an experiment with the optimal hyperparameters and the polystyrene spheres subset (PolyS_subset) for a thousand and then again for ten thousand epochs, however, the validation and training losses have not converged and kept being split, indicating the model was still overfitting and not learning anymore.
Finally, image augmentations were used on the smaller (PolyS_subset) dataset to determine whether the same accuracy in the PolyS and PolyE PSDs can be obtained, however, the results show that using a larger dataset produces a more accurate model.
Description
Keywords
Pharmaceutical manufacturing industry, Batch mode of production, Quality control, Particle size distribution, Continuous manufacturing, Offline measurements, Inline measurement methods, Chord length distribution, Laser probe, Inline microscopy, Image analysis, Machine learning models, Polystyrene and lactose dataset, Optimization, Hyperparameters, ResNet, Vanishing gradient problem, Seed value, Performance metrics, Cosine similarity, Pearson correlation coefficient, Spearman correlation coefficient, RMSE (Root Mean Square Error), R2 Score, Mode function, Polystyrene spheres subset dataset, Number of images in a stack, Batch size, Learning rate, Seed variation, Model training, Outliers, Polystyrene spheres dataset, Polystyrene ellipsoids and spheres mixture dataset, Concentration of particles, Grokking effect, Image augmentations