Enhanced training of self-attention learning models for analysis and segmentation of speech

  • Hà Minh Tân
  • Nguyễn Kim Quốc
Keywords: Pre-trained framework, transformer, self-attention, fine-tuning, temporal masking, voice separation

Abstract

     This study introduces a novel augmentation approach employing a self-attention model for isolating single-channel speech. Initially, we immobilize all layers within the pre-trained self-attention model. Subsequently, we embark on a three-stage retraining process, incorporating a scheduling mechanism to adapt the learning rate and gradually unlock layers based on a pre-defined schedule. This iterative procedure facilitates the refinement and enhancement of the model's capabilities, leveraging prior knowledge to elevate performance metrics while curtailing training duration and expenses. Notably, this technique not only surpasses conventional methodologies in terms of efficacy but also holds promises for enhancing the performance of pre-existing models. Experimental results underscore the superiority of models trained using this methodology over established techniques in the domain of monosyllabic speech separation across standard datasets.

điểm /   đánh giá
Published
2024-04-12
Section
KHOA HOC CÔNG NGHỆ