THE PROBLEM OF VANISHING GRADIENTS AND COUNTERPROPAGATION METHODS IN DEEP LEARNING MODEL TRAINING

  • Phạm Ngọc Giàu
  • Tống Lê Thanh Hải
Keywords: Neural networks, MLP, vanishing gradients.

Abstract

In supervised deep learning, gradients are information to update weights
during training, if the gradient is too small or zero, the weights are almost
unchanged, leading to the model not learning anything from the data. The article
providing solutions to the problem of vanishing gradients in Multi Layer Perceptrons
(MLP) neural networks when performing train models that are too deep (with many
hidden layers). There are six different methods that affect the model, train tactics,
etc. to help minimize vanishing gradients featured in the article on the
FashionMNIST dataset. In addition, we also introduced and built the
MyNormalization() function, a custom function similar to Pytorch's BatchNorm. The
purpose of this function is to control variance and reduce the volatility of
characteristics across layers. The ultimate goal is to optimize the deep MLP model so
that it can learn efficiently from data without being affected by the gradient
vanishing problem.

điểm /   đánh giá
Published
2024-02-19
Section
RESEARCH AND DEVELOPMENT