(Go: >> BACK << -|- >> HOME <<)

《前饋神經網絡》嘅修訂比較

內容刪咗 內容加咗
Dr. Greywolf討論貢獻
m zap1
Dr. Greywolf討論貢獻
m zap1
第1行:
[[File:Colored neural network.svg|thumb|300px|一個前饋人工神經網絡嘅抽象圖;每個圓圈代表一粒[[人工神經細胞|神經細胞]],每粒神經細胞嘅啟動程度淨係受佢打前嗰排神經細胞嘅啟動程度影響<ref>"[https://www.frontiersin.org/research-topics/4817/artificial-neural-networks-as-models-of-neural-information-processing Artificial Neural Networks as Models of Neural Information Processing | Frontiers Research Topic]". Retrieved 2018-02-20.</ref>。]]
'''前饋神經網絡'''({{jpingauto|cin4 gwai3 san4 ging1 mong5 lok6}};[[英文]]:{{lang-en|'''feedforward neural network'''}})係最簡單最早期嗰種[[人工神經網絡]](ANN)<ref>Zell, Andreas (1994). ''Simulation Neuronaler Netze'' [Simulation of Neural Networks] (in German) (1st ed.). Addison-Wesley. p. 73.</ref>。一個前饋神經網絡會有一浸'''輸入層''(input layer)'(<code>input</code>)同一浸'''輸出層''(output layer),'(<code>output</code>),亦可能有一浸'''隱藏層''(hidden layer)'(<code>hidden</code>)<ref group="註">喺實際應用上,冇隱藏層嘅前饋神經網絡好多時都淨係搞得掂簡單嘅[[線性關係]],所以有用嘅前饋神經網絡多數有隱藏層。</ref>。每一粒[[人工神經細胞|神經細胞]]都有條噉嘅式<ref name="sch">Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". ''Neural Networks''. 61: 85–117.</ref><ref>Ivakhnenko, A. G. (1973). ''Cybernetic Predicting Devices''. CCM Information Corporation.</ref>:
 
:<math>t = W_1 A_1 + W_2 A_2...</math>;([[啟動函數]])
第6行:
喺呢條式當中,<math>t</math> 代表嗰粒神經細胞嘅啟動程度,<math>A_n</math> 代表前一排嘅神經細胞當中第 <math>n</math> 粒嘅啟動程度,而 <math>W_n</math> 就係其他神經細胞當中第 <math>n</math> 粒嘅權重(指嗰粒神經細胞有幾影響到 <math>t</math>)。<math>A_n</math> 當中唔包括任何前排以外嘅細胞,令成個網絡嘅[[訊號]]'''只會以一個方向傳遞'''-呢一點令前饋神經網絡好唔似[[生物神經網絡]],亦都係前饋網絡同[[遞迴神經網絡]](recurrent neural network)嘅主要差異<ref name="diff">[https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7 The differences between Artificial and Biological Neural Networks]. ''Towards Data Science''.</ref>。
 
雖然係噉,事實說明咗前饋神經網絡能夠輕易處理'''非連串性'''(non-sequential;一串[[文字]]就有連串性-前面嘅[[資訊]]會影響後面嘅資訊嘅意思)而且'''唔視乎時間''(not time-dependent;'(一個視乎時間嘅數據帶嘅資訊會受時間影響,<math>\text{info} = f(\text{time})</math>)嘅[[數據]]<ref name="brilliantorg">[https://brilliant.org/wiki/feedforward-neural-networks/#:~:text=Feedfoward%20neural%20networks%20are%20primarily,)%20(x%2Cy). Feedforward neural network]. ''Brilliant.org''.</ref>,例如有[[遊戲人工智能 AI]] 方面嘅研究者試過成功噉訓練一部[[多層感知機]](睇下面)玩[[食鬼]]<ref>Lucas, S. M. (2005, April). Evolving a Neural Network Location Evaluator to Play Ms. Pac-Man. In ''IEEE 2005 Symposium on Computational Intelligence and Games''.</ref>。所以就算到咗廿一世紀,前饋神經網絡都仲有人用<ref name="hosseini">Hosseini, H. G., Luo, D., & Reynolds, K. J. (2006). The comparison of different feed forward neural network architectures for ECG signal diagnosis. ''Medical engineering & physics'', 28(4), 372-378.</ref><ref name="shukla2009">Shukla, A., Tiwari, R., Kaur, P., & Janghel, R. R. (2009, March). Diagnosis of thyroid disorders using artificial neural networks. In ''2009 IEEE International Advance Computing Conference'' (pp. 1016-1020). IEEE.</ref>。
 
== 單層感知機 ==
第29行:
:<math>\text{output} = g(\overrightarrow{w} \cdot \overrightarrow{x} + b)</math>;<math>[1]</math>
 
當中 <math>\overrightarrow{x}</math> 係代表柞輸入嘅[[向量]];<math>\overrightarrow{w}</math> 係代表柞權重嘅向量;而 <math>b</math> 代表'''偏向'''(bias),即係嗰粒神經細胞本身喺啟動上嘅傾向,例如如果有某一粒人工神經細胞嘅 <math>b</math> 係正數而且數值大,佢就會傾向無論輸入係幾多都有強烈嘅啟動。用嘅係[[監督式學習]],個學習演算法要做嘅嘢就係按讀取到嘅數值調整柞 <math>w</math>,等個網絡將來會更加有能力俾到準確嘅輸出<ref name="auer2008"/>。
;例子碼
例如以下呢段用 [[Python 程式語言]]寫嘅[[源碼]]定義咗一個簡單嘅感知機神經網絡<ref group="註">呢部感知機未有機制改變權重,所以唔會識學習。</ref><ref name="firstneural">[https://towardsdatascience.com/first-neural-network-for-beginners-explained-with-code-4cfd37e06eaf First neural network for beginners explained (with code)]. ''Towards Data Science''.</ref>:
第70行:
 
;第二步:更新權重值
#每一個權重值,佢都會由條式嗰度有一個'''梯度值'''(gradient);
#每一個權重值嘅改變幅度等如個梯度值乘以 <math>\eta</math>-如果 <math>\eta</math> 係 0,噉個神經網絡永遠都唔會變,而如果 <math>\eta</math> 數值大,噉個神經網絡會變化得好快,所以 <math>\eta</math> 掌管咗個神經網絡學習有幾快;
#將邇柞值「反向傳播」返去個神經網絡嗰度,將每個權重值變成佢嘅新數值(實際更新 <math>w_{ij}</math> 值);
第78行:
 
=== 局限 ===
[[單層感知機]](single-layer perceptron;冇隱藏層嘅感知機)嘅局限在於佢係線性嘅分類機。單層感知機嘅感知機只能夠學識作出線性嘅分類,即係例如按兩個變數 <math>x</math> 同 <math>y</math> 將一柞個案分類,一個分類機會畫一條線,而條線係 <math>x</math> 同 <math>y</math> 嘅函數(例:<math>y = 2x + 5</math>),如果呢條線能夠正確噉分開兩類個案,條線就係一部成功嘅''[[線性分類機'']](linear classifier);根據研究,單層感知機淨係處理得到[[線性]](linear)嘅關係,如果個實際關係唔係線性,單層感知機就會搞唔掂。想像以下呢兩幅圖:
{{clear}}
[[File:Kernel Machine.svg|510px|center]]
第86行:
== 多層感知機 ==
{{main|多層感知機}}
[[多層感知機]](multi-layer perceptron,MLP)係一個包含多部感知機嘅人工神經網絡:多層感知機有'''隱藏層'''(hidden layer),即係唔會直接收外界輸入又唔會直接向外界俾輸出嘅神經細胞層;同單層感知機唔同嘅係,多層感知機能夠處理非線性嘅關係,喺好多人工神經網絡應用上都有價值<ref>Pal, S. K., & Mitra, S. (1992). ''Multilayer perceptron, fuzzy sets, classifiaction''.</ref>。一部三層(有一浸隱藏層)嘅感知機可以想像成以下噉嘅樣<ref name="sch"/>:
{{clear}}
[[File:Artificial neural network.svg|360px|center]]
第93行:
 
定義上,多層感知機具有以下嘅特徵<ref name="sch"/>:
*每粒第 <math>i</math> 層嘅神經細胞都同第 <math>i-1</math> 層嘅神經細胞有連繫,即係話每粒第 <math>i-1</math> 層嘅神經細胞都有能力影響第 <math>i</math> 層嘅神經細胞嘅啟動程度,即係每層之間都'''完全連繫''(fully connected),',不過權重值可以係 0;
*第 <math>i</math> 層嘅神經細胞唔會受第 <math>j</math> 層嘅神經細胞影響,當中 <math>j</math> 係任何一個大過 <math>i</math> 嘅整數;
*同一層嘅神經細胞之間冇連繫。
=== 反向傳播算法 ===
{{main|反向傳播算法}}
[[反向傳播算法]](backpropagation)係 [[delta 法則]](睇上面)嘅廣義化:喺得到誤差函數之後,就可以計柞 <math>w</math> 要點調整<ref>Nielsen, Michael A. (2015). "Chapter 6". ''Neural Networks and Deep Learning''.</ref><ref>Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ''ARS Journal''. 30 (10): 947–954.</ref>,例如''確率勾配降[[隨機梯度'']](stochastic gradient descent)噉,就會運用以下呢條算式嚟計出每個權重值要點變<ref>Mei, Song (2018). "A mean field view of the landscape of two-layer neural networks". ''Proceedings of the National Academy of Sciences''. 115 (33): E7665–E7671.</ref>:
 
:<math>w_{ij}(t + 1) = w_{ij}(t) + \eta\frac{\partial E(X)}{\partial w_{ij}} +\xi(t) </math>;<math>[5]</math> 當中
第106行:
*<math>E(X)</math> 係個誤差,反映咗喺個個案入面個神經網絡俾嘅輸出同正確輸出差幾遠;
*<math>\frac{\partial E(X)}{\partial w_{ij}}</math> 係 <math>E(X)</math> 隨住 <math>w_{ij}</math> 嘅[[偏導數]](partial derivative);
*<math>\xi(t) </math> 係一個''[[隨機''(stochastic)]]嘅數值<ref>Dreyfus, Stuart (1962). "The numerical solution of variational problems". ''Journal of Mathematical Analysis and Applications''. 5 (1): 30–45. </ref><ref>Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986). "Learning representations by back-propagating errors". ''Nature''. 323 (6088): 533–536.</ref>。
 
如果一個以電腦程式寫嘅神經網絡跟呢條式(或者係類似嘅式)嚟行嘅話,佢喺計完每一個個案之後,都會計出佢裏面嘅權重值要點樣改變,並且將呢個「每個權重應該要點變」嘅資訊傳返去個網絡嗰度(所以就叫「反向傳播」)。而每次有個權重值改變嗰陣,佢嘅改變幅度會同「誤差值」有一定嘅關係,而且佢對計個輸出嘅參與愈大,佢嘅改變幅度會愈大<ref>Dreyfus, Stuart (1973). "The computational solution of optimal control problems with time lag". ''IEEE Transactions on Automatic Control''. 18 (4): 383–385.</ref>-個神經網絡會一路計個案一路變,變到誤差值愈嚟愈接近零為止<ref>Dreyfus, Stuart E. (1990-09-01). "Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure". ''Journal of Guidance, Control, and Dynamics''. 13 (5): 926–928. </ref>。而除咗確率勾配降下法之外,反向傳播仲有好多其他方法做,詳情可以睇[[最佳化]](optimization)相關嘅課題<ref>Huang, Guang-Bin; Zhu, Qin-Yu; Siew, Chee-Kheong (2006). "Extreme learning machine: theory and applications". ''Neurocomputing''. 70 (1): 489–501.</ref><ref>Widrow, Bernard; et al. (2013). "The no-prop algorithm: A new learning algorithm for multilayer neural networks". ''Neural Networks''. 37: 182–188.</ref>。