Deep Reinforcement Learning with Hidden Markov Model for Speech Recognition

  • Samson Isaac
  • Khalid Haruna
  • Muhammad Aminu Ahmad
  • Rabi Mustapha
Keywords: Speech Recognition; Hidden Markov Model; Natural Language Processing; Deep Learning; Reinforcement Learning

Abstract

      Nowadays, many applications uses speech recognition especially the field of computer science and electronics, Speech Recognition (SR) is the interpretation of words spoken into a text. It is also known as Speech-To-Text (STT) or Automatic-Speech-Recognition(ASR), or just Word-Recognition(WR). The Hidden-Markov-Model (HMM) is a type of Markov model, which means that the future state of the model depends on the current state, not on the entire history of the system and the goal of HMM is to learn a sequence of hidden states from a set of known states. The Long-Short-Time-Memory (LSTM) network is a type of Recurrent Neural Network (RNN) that can learn long-term dependencies between time steps of sequence data. The LSTM network is trained by the network in order to predict the values of subsequent time steps in a series-to-series regression. Deep Neural Network (DNN) models are better classifiers than Gaussian Mixture Models (GMMs), they can generalize much better with a smaller number of parameters over complex distributions. They model distributions of different classes jointly, called “distributed” learning, or, more properly “tied” learning. This work is aimed at developing a speech recognition model that will predict isolated speech of some selected fruits in Hausa, Igbo and Yoruba language by using the predicting power of Mel-Frequency-Cepstral-Coefficient (MFCC), LSTM and HMM algorithms. The findings of the study would improve the development of better automatic speech applications systems and would benefit the academic and research community in the field of Natural Language Processing.

điểm /   đánh giá
Published
2023-09-20
Section
Bài viết