ENSEMBLE LEARNING APPROACHES FOR CLASSIFICATION WITH  HIGH-DIMENSIONAL DATA

Cao Truong Tran

Cao Truong Tran Institute of Information and Communication Technology, Le Quy Don Technical University

Tóm tắt

Classification with high-dimensional data is a significant challenge in machine learning because the abundance of features in high-dimensional data makes it difficult to identify meaningful patterns, which leads to overfitting and reduced classification performance. Moreover, the computational cost of processing high-dimensional data is often prohibitively expensive, requiring specialized hardware or optimized algorithms. Ensemble learning is a powerful machine learning technique that combines multiple models to improve classification accuracy. By aggregating the predictions of multiple models, ensemble learning can reduce overfitting, increase robustness, and improve performance on a wide range of real-world classification problems. Ensemble learning is effective for classification with high-dimensional data because it can combine multiple models to mitigate the effects of the curse of dimensionality, reduce overfitting, and enhance generalization performance. By using different learning algorithms or subsets of features, ensemble learning can improve the diversity of the models, leading to better overall performance on high-dimensional data. This paper proposes two hybrid ensemble machine learning approaches that integrate random subspace ensemble with bagging and boosting to enhance classification performance with high-dimensional data. Experimental results demonstrate that these methods significantly improve classification accuracy with highdimensional data.

ENSEMBLE LEARNING APPROACHES FOR CLASSIFICATION WITH HIGH-DIMENSIONAL DATA

Tóm tắt

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN KHOA HỌC VÀ CÔNG NGHỆ QUỐC GIA - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION