VẤN ĐỀ VANISHING GRADIENT VÀ CÁC PHƯƠNG PHÁP   XỬ LÝ KHI LAN TRUYỀN NGƯỢC   TRONG HUẤN LUYỆN MÔ HÌNH HỌC SÂU

Phạm Ngọc Giàu; Tống Lê Thanh Hải

Phạm Ngọc Giàu
Tống Lê Thanh Hải

Keywords: Neural networks, MLP, vanishing gradients.

Abstract

In supervised deep learning, gradients are information to update weights
during training, if the gradient is too small or zero, the weights are almost
unchanged, leading to the model not learning anything from the data. The article
providing solutions to the problem of vanishing gradients in Multi Layer Perceptrons
(MLP) neural networks when performing train models that are too deep (with many
hidden layers). There are six different methods that affect the model, train tactics,
etc. to help minimize vanishing gradients featured in the article on the
FashionMNIST dataset. In addition, we also introduced and built the
MyNormalization() function, a custom function similar to Pytorch's BatchNorm. The
purpose of this function is to control variance and reduce the volatility of
characteristics across layers. The ultimate goal is to optimize the deep MLP model so
that it can learn efficiently from data without being affected by the gradient
vanishing problem.

THE PROBLEM OF VANISHING GRADIENTS AND COUNTERPROPAGATION METHODS IN DEEP LEARNING MODEL TRAINING

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN KHOA HỌC VÀ CÔNG NGHỆ QUỐC GIA - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION