CẢI TIẾN THUẬT TOÁN TỐI ƯU  GIẢI BÀI TOÁN SUY DIỄN HẬU NGHIỆM VỚI MÔ HÌNH CHỦ ĐỀ

Dương Thị Nhung; Bùi Thị Thanh Xuân

Dương Thị Nhung
Bùi Thị Thanh Xuân

Abstract

The posterior inference problem for individual text plays an important role in the topic models. However, in solving this problem, it is usually given as a nonconvex optimization problem with the large datasets, so it is often NP-hard. There are many methods proposed to approximate the posterior inference problem such as Variational Bayes (VB), collapsed variational Bayes (CVB) or collapsed Gibbs sampling (CGS) methods, but these methods do not guarantee the quality or convergence rate. Using the idea of Online Frank-Wolfe algorithm (OFW) and Online Maximum a Posteriori Estimation (OPE) algorithm, we propose two efficient algorithms for solving the posterior inference problem in the topic models which are IOPE1 and IOPE2. Using stochastic bounds, stochastic approximation and probability distributions such as uniform distribution, Bernoulli distribution, our improvements are used to develop new effective method for learning LDA from large text collections. Experimental results show that our approaches are often more effective than OPE.

IMPROVEMENT OPTIMIZATION ALGORITHMS APPLIED FOR SOLVING THE POSTERIOR INFERENCE PROBLEM IN TOPIC MODELS

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN KHOA HỌC VÀ CÔNG NGHỆ QUỐC GIA - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION