Discuz! Board

標題: weight update very slow. The opposite [打印本頁]

作者: RABIULISLAMSP88    時間: 2024-3-6 16:39
標題: weight update very slow. The opposite
gradient explosion problem of gradient explosion and gradient disappearance means that the gradient may become very large during the training process that causes too much weight update, so that the network cannot converge. In RNN, if the sequence is very long, the gradient in the reverse propagation may need to pass the multiplication operation of many steps, which may cause the gradient to become very large, making the weight update too




large, so that the network cannot converge. 3. Optimized algorithm LLN HR-ERY L is a special RNN. It solves gradient disappearance and explosion problems by introducing door control mechanisms by introducing door control mechanisms. The door control mechanism is a way Rich People Phone Number List of control information flow. In L, each unit has a memory cell and three types of doorless FRE E that determines which information should be forgotten or abandoned. The input door INU E determines which new information should be stored in the cell state. The output door UU E determines which information in the cell state should be read and output.





Each door has a II neural network layer and a point of accumulation. II layer output numbers that determine the amount of information should be passed. Said that "let all information pass" let all information pass. L solved the problem of gradient disappearance and explosion of traditional RNN through its door control mechanism, so that L can avoid the problem of gradient disappearance and gradient explosion when dealing with long sequences to learn long -distance dependencies. The figure below is the principle of L. The specific principle of the diagram of L is not here to detail students who are interested can inquire themselves. Gate -control cycle unit RUE Reurren Uni Ru is another high -level RNN and L's structure than RU. Only two types of door update doors. Information. Rewinding the door Ree E determines how many old hidden state should be ignored when generating a new hidden state. Ru's door mechanism allows it to learn long -distance dependencies when processing long sequences. At the same time, because its structure is






歡迎光臨 Discuz! Board (http://lineage.4dhost.org/) Powered by Discuz! X3.3
一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |