High performance convolutional neural networks for document processing
K Chellapilla, S Puri, P Simard - Tenth international workshop on …, 2006 - inria.hal.science
K Chellapilla, S Puri, P Simard
Tenth international workshop on frontiers in handwriting recognition, 2006•inria.hal.scienceConvolutional neural networks (CNNs) are well known for producing state-of-the-art
recognizers for document processing [1]. However, they can be difficult to implement and are
usually slower than traditional multi-layer perceptrons (MLPs). We present three novel
approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear
algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution
converts the processing in each convolutional layer (both forward-propagation and back …
recognizers for document processing [1]. However, they can be difficult to implement and are
usually slower than traditional multi-layer perceptrons (MLPs). We present three novel
approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear
algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution
converts the processing in each convolutional layer (both forward-propagation and back …
Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution converts the processing in each convolutional layer (both forward-propagation and back-propagation) into a matrix-matrix product. The matrix-matrix product representation of CNNs makes their implementation as easy as MLPs. BLAS is used to efficiently compute matrix products on the CPU. We also present a pixel shader based GPU implementation of CNNs. Results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X−3.0X speedup. The GPU implementation is even faster and produces a 3.1X−4.1X speedup.
inria.hal.science
Показан е най-добрият резултат за това търсене. Показване на всички резултати