Sustainable Development Goals Attainment Prediction: A Hierarchical Framework using Time Series Modelling
Abstract
This thesis presents work focused on finding an effective mechanism for Sustainable Development Goal (SDG) attainment predictions for geographical entities. Each SDG has several targets, and each target has several indicators to predict. The motivation is the desire to utilise the published SDG data in a bottom-up, hierarchical classification based on time series modelling, to predict which geographical entities will meet their SDGs on time, if ever. The main research question that this thesis seeks to address is "How can the tools and techniques of machine learning be harnessed to effectively and efficiently conduct attainment prediction in the context of the UN Sustainable Development Goals?". Three frameworks are proposed and evaluated across 38 geographical entities, spanning four geographical regions. The 38 geographical regions included in this thesis have a total of 200,742 individual observations, each representing a single time series data point. These observations were collected and compiled from the data sets of all 38 regions. To facilitate data management and analysis, the data set was transformed, with each column corresponding to a unique time series and the index denoting specific points in time (Year). This transformation effectively converted the 200,742 individual observations into 36,421 unique time series. The first approach, SDG-AP, assumed that time series were unrelated and independent, so a univariate time series forecasting approach could be adopted. A number of univariate forecast models were considered. The second approach, SDG-CAP, tested the hypothesis that intra-entity relationships existed between SDG indicators within the context of a single geographic region, and hence a multivariate time series forecasting approach could be adopted, which would produce better SDG attainment predictions than those produced using SDG-AP. A number of approaches for identifying such relationships were considered. Finally, the last method, SDG-TTF, tested the hypothesis that both intra- and inter-entity relations between the different time series could be found and that utilisation of these relationships for multivariate time series forecasting would produce a more effective SDG attainment prediction than in the case of the previous two frameworks considered. Root Mean Square Error (RMSE) was used to evaluate SDG-AP as well as Critical Difference Diagram. Whereas the SDG-CAP and SDG-TTF frameworks were evaluated using RMSE, Borda Count and Critical Difference Diagrams.
Description
Keywords
Multivariate time series forecasting, Sustainable Development Goal attainment predictions, Machine learning for SDG predictions, Bottom-up hierarchical classification