Tăng cường huấn luyện mô hình học tự chú ý cho phân tích  và phân đoạn tiếng nói

Hà Minh Tân; Nguyễn Kim Quốc

Hà Minh Tân
Nguyễn Kim Quốc

Keywords: Pre-trained framework, transformer, self-attention, fine-tuning, temporal masking, voice separation

Abstract

This study introduces a novel augmentation approach employing a self-attention model for isolating single-channel speech. Initially, we immobilize all layers within the pre-trained self-attention model. Subsequently, we embark on a three-stage retraining process, incorporating a scheduling mechanism to adapt the learning rate and gradually unlock layers based on a pre-defined schedule. This iterative procedure facilitates the refinement and enhancement of the model's capabilities, leveraging prior knowledge to elevate performance metrics while curtailing training duration and expenses. Notably, this technique not only surpasses conventional methodologies in terms of efficacy but also holds promises for enhancing the performance of pre-existing models. Experimental results underscore the superiority of models trained using this methodology over established techniques in the domain of monosyllabic speech separation across standard datasets.

Enhanced training of self-attention learning models for analysis and segmentation of speech

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN KHOA HỌC VÀ CÔNG NGHỆ QUỐC GIA - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION