Optimizing GPU Convnets
A Baranwal - 2023 - research-collection.ethz.ch
… neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution
layers, the runtime for convolution layers … NVIDIA’s CUTLASS [29], for direct convolution …
layers, the runtime for convolution layers … NVIDIA’s CUTLASS [29], for direct convolution …
[PDF][PDF] Heterogeneous system-on-chip for AI computing
JF Johansen - 2022 - duo.uio.no
… unit can help reduce the power load and memory usage on the GPU and … guide NVIDIA
explains that "DLA is designed to do full hardware acceleration on convolutional neural networks"…
explains that "DLA is designed to do full hardware acceleration on convolutional neural networks"…
Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks
A SIKLUND, MAX SEDERSTEN - 2024 - odr.chalmers.se
… CPU approach can outperform the GPU hardware in certain … network structures such as
Convolutional Neural Networks (CNN)… of a convolution-based pose prediction model for use with …
Convolutional Neural Networks (CNN)… of a convolution-based pose prediction model for use with …
oneAPI open-source math library interface
M Krainiuk, M Goli, VR Pascuzzi - 2021 International Workshop …, 2021 - ieeexplore.ieee.org
… abstraction layer by … NVIDIA hardware, we compared GEMM function implementation in
the oneMKL open-source interfaces library with the native libraries provided by Intel and NVIDIA…
the oneMKL open-source interfaces library with the native libraries provided by Intel and NVIDIA…
Pure tensor program rewriting via access patterns (representation pearl)
GH Smith, A Liu, S Lyubomirsky, S Davidson… - Proceedings of the 5th …, 2021 - dl.acm.org
… Imagenet classification with deep convolutional neural networks. In Advances in Neural
Information Processing Systems. [20] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert …
Information Processing Systems. [20] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert …
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
… model using Tensor Core GPU. We set the configurations of V100 GPU according to the …
We use 2D convolution layers from ResNet-18 and compare the ground-truth performance …
We use 2D convolution layers from ResNet-18 and compare the ground-truth performance …
Soft masking for cost-constrained channel pruning
… We use a latency cost constraint, defined by a layer-wise … speed on a NVIDIA TITAN V GPU
with cudNN V7.6.5 [4]. … pruning the early convolution layers and leaves the later layers better …
with cudNN V7.6.5 [4]. … pruning the early convolution layers and leaves the later layers better …
Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution
A Ofir, G Ben-Artzi - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
… We compare SMM-Conv to im2col and MEC with different convolutional layer parameters.
The … Yololite: a real-time object detection algorithm optimized for non-gpu computers. In 2018 …
The … Yololite: a real-time object detection algorithm optimized for non-gpu computers. In 2018 …
Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference
A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org
… the GPU and TPU which is important for unpruned, dense models that continue to be used.
Any unstructured sparse operation … only in convolutional neural networks (CNNs). Recurrent …
Any unstructured sparse operation … only in convolutional neural networks (CNNs). Recurrent …
POAS: a framework for exploiting accelerator level parallelism in heterogeneous environments
… , convolution, the heart of convolutional neural networks (CNNs… differences between GEMM
and convolution, since most of … During this evaluation, we refer to an XPU as a GPU that uses …
and convolution, since most of … During this evaluation, we refer to an XPU as a GPU that uses …