Optimizing GPU Convnets

A Baranwal - 2023 - research-collection.ethz.ch
neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution
layers, the runtime for convolution layersNVIDIA’s CUTLASS [29], for direct convolution

[PDF][PDF] Heterogeneous system-on-chip for AI computing

JF Johansen - 2022 - duo.uio.no
… unit can help reduce the power load and memory usage on the GPU and … guide NVIDIA
explains that "DLA is designed to do full hardware acceleration on convolutional neural networks"…

Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

A SIKLUND, MAX SEDERSTEN - 2024 - odr.chalmers.se
… CPU approach can outperform the GPU hardware in certain … network structures such as
Convolutional Neural Networks (CNN)… of a convolution-based pose prediction model for use with …

oneAPI open-source math library interface

M Krainiuk, M Goli, VR Pascuzzi - 2021 International Workshop …, 2021 - ieeexplore.ieee.org
… abstraction layer by … NVIDIA hardware, we compared GEMM function implementation in
the oneMKL open-source interfaces library with the native libraries provided by Intel and NVIDIA

Pure tensor program rewriting via access patterns (representation pearl)

GH Smith, A Liu, S Lyubomirsky, S Davidson… - Proceedings of the 5th …, 2021 - dl.acm.org
… Imagenet classification with deep convolutional neural networks. In Advances in Neural
Information Processing Systems. [20] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert …

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu… - Proceedings of the 49th …, 2022 - dl.acm.org
… model using Tensor Core GPU. We set the configurations of V100 GPU according to the …
We use 2D convolution layers from ResNet-18 and compare the ground-truth performance …

Soft masking for cost-constrained channel pruning

R Humble, M Shen, JA Latorre, E Darve… - European Conference on …, 2022 - Springer
… We use a latency cost constraint, defined by a layer-wise … speed on a NVIDIA TITAN V GPU
with cudNN V7.6.5 [4]. … pruning the early convolution layers and leaves the later layers better …

Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution

A Ofir, G Ben-Artzi - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
… We compare SMM-Conv to im2col and MEC with different convolutional layer parameters.
The … Yololite: a real-time object detection algorithm optimized for non-gpu computers. In 2018 …

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org
… the GPU and TPU which is important for unpruned, dense models that continue to be used.
Any unstructured sparse operation … only in convolutional neural networks (CNNs). Recurrent …

POAS: a framework for exploiting accelerator level parallelism in heterogeneous environments

PA Martínez, G Bernabé, JM García - The Journal of Supercomputing, 2024 - Springer
… , convolution, the heart of convolutional neural networks (CNNs… differences between GEMM
and convolution, since most of … During this evaluation, we refer to an XPU as a GPU that uses …