Enhancing the accuracy of rainfall area classification in central Vietnam using machine learning methods
Abstract
This study applies machine learning techniques, including Light Gradient Boosting Machine (LGBM), XGBoost (XGB), and Random Forest (RF), in conjunction with multi-source data comprising Himawari-8 satellite observations, ground-based rain gauge measurements, and auxiliary data such as ERA-5 reanalysis and the ASTER Digital Elevation Model (DEM), to enhance rainfall classification accuracy over Central Vietnam. Existing rainfall products in the region, including IMERG Final Run, IMERG Early, GSMaP_MVK_Gauge, PERSIANN_CCS, and FY-4A, are employed to evaluate the performance of the proposed classification approach. The results indicate that all proposed rainfall classification products exhibit high performance. Among them, the rainfall classification product based on LGBM achieved the highest performance across key evaluation metrics, including Probability of Detection (POD), Critical Success Index (CSI), Equitable Threat Score (ETS), and Heidke Skill Score (HSS). Compared to the investigated best-performing reference product, GSMaP_MVK_Gauge, the LGBM improves these metrics by 38.89%, 20.0%, 16.67%, and 13.04%, respectively. These findings highlight the potential of machine learning models, particularly LGBM, in enhancing the classification performance of meteorological models that utilize small but complex and high-dimensional datasets.