Abstract:
Poverty is a widespread and critical problem in developing countries. Furthermore, the World Bank predicts that COVID-19 would cause poverty to worsen in nations with already high poverty rates by 2021, and so it is vital to map poverty in developing nations in order to aid humanitarian groups and governments in executing poverty alleviation initiatives and allocating available resources for long-term development. Thus, we investigated the application of (2) different machine learning strategies to the estimation of poverty in this study by training input data on publicly available and freely accessible multiple open-source datasets, including nighttime lights (NTL) satellite imagery and OpenStreetMap (OSM), as well as transfer learning-based convolutional neural networks (CNN) on daytime satellite imagery. The first machine learning method is to train on a single model such as linear regression or ridge regression as our baseline models, while enhancing their performance using random forest regression, gradient boosting regression, or xgboost regression. Another method is to integrate all of these isolated models into a single final model using a new stacking method in order to enhance the performance of the separate ones. We presented a multiple open-source approach as a simple, cost-effective, and alternative choice to the methodology developed using deep learning in the same context, poverty assessment. To evaluate our regression models, we utilized the metrics namely coefficient of determination (R²) and mean squared error (MSE). The wealth index from Demographic and Health Surveys (DHS) is utilized as the ground truth data for the poverty index. We discovered that a single model could account for around 74% of the variation in asset-based wealth from multiple open-source input features of Myanmar whereas a novel stacking algorithm describe roughly 81% of the variation. Although obtaining satellite images and training the CNN deep learning model required many days, utilizing CNN features outscored open-source features by R² of 84%. According to these findings, the model based on open-source data on nightlight intensity and geospatial mapping also offers a possible alternative that achieves equivalent results while being easy to use and computationally affordable. Overall, the best R² performance, 89%, was achieved by integrating all of these features. After adapting our model that was trained in Myanmar to other developing countries, such as Bangladesh and Cambodia, we checked the models performance, which resulted in 71% and 72% respectively, demonstrating not only a strong predictive power but also reasonable capacity for generalization of our model. Finally, we generated a poverty map for these (3) nations at the provincial administrative level. Our results demonstrate that a novel stacking machine learning model using open-source data is still as effective as deep learning approaches in estimating poverty, and that our methodology is generalizable to other nations as well
Thammasat University. Thammasat University Library