Evaluation of Feature Selection Methods for Gene Expression Data Classifcation

  • Phan Thị Thu Hồng
  • Nguyễn Thị Thủy

Abstract

     Selection of relevant genes that have effects in some diseases is a challenging task in gene expression studies. Most gene selection studies focused on assessing the association between individual gene and the disease. In fact, diseases are thought to involve a complex etiology including complicated interactions between many genes and the disease. Random Forest (RF) method has recently been successfully used for identifying genetic factors that have effects in some complex diseases. In spite of performing well in some data sets with moderate size, RF still suffers from working for selecting informative genes and building accurate prediction models. In this paper, we investigated some methods in learning advanced random forests that allow one to select a sub-set of informative genes (most relevant to disease). The method can therefore reduce the dimensionality and can perform well in prediction highdimensional data sets. The performance of these methods has been analyzed for finding the robust one for each interest objective (the accuracy of the prediction model or the smallest possible set of relevant genes) based on experiments results on 8 available public data sets of gene expression from the repository of biomedical data sets (Kent Ridge) and bioinformatics data sets (Bioinformatics).

điểm /   đánh giá
Published
2017-07-24
Section
ENGINEERING AND TECHNOLOGY