|
gradient explosion problem of gradient explosion and gradient disappearance means that the gradient may become very large during the training process that causes too much weight update, so that the network cannot converge. In RNN, if the sequence is very long, the gradient in the reverse propagation may need to pass the multiplication operation of many steps, which may cause the gradient to become very large, making the weight update too
large, so that the network cannot converge. 3. Optimized algorithm LLN HR-ERY L is a special RNN. It solves gradient disappearance and explosion problems by introducing door control mechanisms by introducing door control mechanisms. The door control mechanism is a way Rich People Phone Number List of control information flow. In L, each unit has a memory cell and three types of doorless FRE E that determines which information should be forgotten or abandoned. The input door INU E determines which new information should be stored in the cell state. The output door UU E determines which information in the cell state should be read and output.

Each door has a II neural network layer and a point of accumulation. II layer output numbers that determine the amount of information should be passed. Said that "let all information pass" let all information pass. L solved the problem of gradient disappearance and explosion of traditional RNN through its door control mechanism, so that L can avoid the problem of gradient disappearance and gradient explosion when dealing with long sequences to learn long -distance dependencies. The figure below is the principle of L. The specific principle of the diagram of L is not here to detail students who are interested can inquire themselves. Gate -control cycle unit RUE Reurren Uni Ru is another high -level RNN and L's structure than RU. Only two types of door update doors. Information. Rewinding the door Ree E determines how many old hidden state should be ignored when generating a new hidden state. Ru's door mechanism allows it to learn long -distance dependencies when processing long sequences. At the same time, because its structure is
|
|