skip to main content
10.5555/2969442.2969505guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Training very deep networks

Published: 07 December 2015 Publication History

Abstract

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.

References

[1]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.
[2]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv:1409.4842 [cs], September 2014.
[3]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs], September 2014.
[4]
DC Ciresan, Ueli Meier, Jonathan Masci, Luca M Gambardella, and Jürgen Schmidhuber. Flexible, high performance convolutional neural networks for image classification. In IJCAI, 2011.
[5]
Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
[6]
Dong Yu, Michael L. Seltzer, Jinyu Li, Jui-Ting Huang, and Frank Seide. Feature learning in deep neural networks-studies on speech recognition tasks. arXiv preprint arXiv:1301.3605, 2013.
[7]
Sepp Hochreiter and Jurgen Schmidhuber. Bridging long time lags by weight guessing and "long short-term memory". Spatiotemporal models in biological and artificial systems, 37:65-72, 1996.
[8]
Johan Håstad. Computational limitations of small-depth circuits. MIT press, 1987.
[9]
Johan Håstad and Mikael Goldmann. On the power of small-depth threshold circuits. Computational Complexity, 1(2):113-129, 1991.
[10]
Monica Bianchini and Franco Scarselli. On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Transactions on Neural Networks, 2014.
[11]
Guido F Montufar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. In Advances in Neural Information Processing Systems. 2014.
[12]
James Martens and Venkatesh Medabalimi. On the expressive efficiency of sum product networks. arXiv:1411.7717[cs, stat], November 2014.
[13]
James Martens and Ilya Sutskever. Training deep and recurrent networks with hessian-free optimization. Neural Networks: Tricks of the Trade, pages 1-58, 2012.
[14]
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. pages 1139-1147, 2013.
[15]
Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems 27, pages 2933-2941. 2014.
[16]
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pages 249-256, 2010.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. arXiv:1502.01852 [cs], February 2015.
[18]
David Sussillo and L. F. Abbott. Random walk initialization for training very deep feedforward networks. arXiv:1412.6558 [cs, stat], December 2014.
[19]
Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 [cond-mat, q-bio, stat], December 2013.
[20]
Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. arXiv:1302.4389 [cs, stat], February 2013.
[21]
Rupesh K. Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, and Jürgen Schmidhuber. Compete to compute. In Advances in Neural Information Processing Systems, pages 2310-2318, 2013.
[22]
Tapani Raiko, Harri Valpola, and Yann LeCun. Deep learning made easier by linear transformations in perceptrons. In International Conference on Artificial Intelligence and Statistics, pages 924-932, 2012.
[23]
Alex Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.
[24]
Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. pages 562-570, 2015.
[25]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. FitNets: Hints for thin deep nets. arXiv:1412.6550 [cs], December 2014.
[26]
Jürgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, March 1992.
[27]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527-1554, 2006.
[28]
Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Masters thesis, Technische Universität München, München, 1991.
[29]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, November 1997.
[30]
Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with LSTM. In ICANN, volume 2, pages 850-855, 1999.
[31]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. Highway networks. arXiv:1505.00387 [cs], May 2015.
[32]
Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. Grid long Short-Term memory. arXiv:1507.01526 [cs], July 2015.
[33]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093 [cs], 2014.
[34]
Benjamin Graham. Spatially-sparse convolutional neural networks. arXiv:1409.6070, September 2014.
[35]
Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv:1312.4400, 2014.
[36]
Marijn F Stollenga, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. Deep networks with internal selective attention through feedback connections. In NIPS. 2014.
[37]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv:1412.6806 [cs], December 2014.
[38]
Rupesh Kumar Srivastava, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. Understanding locally competitive networks. In International Conference on Learning Representations, 2015.

Cited By

View all
  • (2023)Language modelling for speaker diarization in telephonic interviewsComputer Speech and Language10.1016/j.csl.2022.10144178:COnline publication date: 1-Mar-2023
  • (2022)Rendered Image Superresolution Reconstruction with Multichannel Feature NetworkScientific Programming10.1155/2022/93935892022Online publication date: 1-Jan-2022
  • (2022)Deep Multi-Scale Residual Connected Neural Network Model for Intelligent Athlete Balance Control Ability EvaluationComputational Intelligence and Neuroscience10.1155/2022/90127092022Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2
December 2015
3626 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 07 December 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Language modelling for speaker diarization in telephonic interviewsComputer Speech and Language10.1016/j.csl.2022.10144178:COnline publication date: 1-Mar-2023
  • (2022)Rendered Image Superresolution Reconstruction with Multichannel Feature NetworkScientific Programming10.1155/2022/93935892022Online publication date: 1-Jan-2022
  • (2022)Deep Multi-Scale Residual Connected Neural Network Model for Intelligent Athlete Balance Control Ability EvaluationComputational Intelligence and Neuroscience10.1155/2022/90127092022Online publication date: 1-Jan-2022
  • (2022)Using DSCB: A Depthwise Separable Convolution Block Rebuild MTCNN for Face DetectionProceedings of the 2022 5th International Conference on Image and Graphics Processing10.1145/3512388.3512389(1-8)Online publication date: 7-Jan-2022
  • (2022)Neural Network Pruning by Recurrent Weights for Finance MarketACM Transactions on Internet Technology10.1145/343354722:3(1-23)Online publication date: 22-Jan-2022
  • (2022)Ensemble deep learningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105151115:COnline publication date: 1-Oct-2022
  • (2021)Alignment attention by matching key and query distributionsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541291(13444-13457)Online publication date: 6-Dec-2021
  • (2021)Memorizing All for Implicit Discourse Relation RecognitionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348501621:3(1-20)Online publication date: 13-Dec-2021
  • (2021)Parallel Connected LSTM for Matrix Sequence Prediction with Elusive CorrelationsACM Transactions on Intelligent Systems and Technology10.1145/346943712:4(1-16)Online publication date: 12-Aug-2021
  • (2021)Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344597520:4(1-25)Online publication date: 26-May-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media