Google Наука

Optimizing GPU Convnets

A Baranwal - 2023 - research-collection.ethz.ch

… neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution
layers, the runtime for convolution layers … NVIDIA’s CUTLASS [29], for direct convolution …

Запазване Позоваване Сродни статии Във вид на HTML

[PDF] uio.no

[PDF][PDF] Heterogeneous system-on-chip for AI computing

JF Johansen - 2022 - duo.uio.no

… unit can help reduce the power load and memory usage on the GPU and … guide NVIDIA
explains that "DLA is designed to do full hardware acceleration on convolutional neural networks"…

Запазване Позоваване Сродни статии Във вид на HTML

[PDF] chalmers.se

Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

A SIKLUND, MAX SEDERSTEN - 2024 - odr.chalmers.se

… CPU approach can outperform the GPU hardware in certain … network structures such as
Convolutional Neural Networks (CNN)… of a convolution-based pose prediction model for use with …

Запазване Позоваване Сродни статии Във вид на HTML

oneAPI open-source math library interface

M Krainiuk, M Goli, VR Pascuzzi - 2021 International Workshop …, 2021 - ieeexplore.ieee.org

… abstraction layer by … NVIDIA hardware, we compared GEMM function implementation in
the oneMKL open-source interfaces library with the native libraries provided by Intel and NVIDIA…

Запазване Позоваване С позовавания в 12 Сродни статии Всички 2 версии

[PDF] arxiv.org

Pure tensor program rewriting via access patterns (representation pearl)

GH Smith, A Liu, S Lyubomirsky, S Davidson… - Proceedings of the 5th …, 2021 - dl.acm.org

… Imagenet classification with deep convolutional neural networks. In Advances in Neural
Information Processing Systems. [20] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert …

Запазване Позоваване С позовавания в 37 Сродни статии Всички 4 версии

[PDF] acm.org

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu… - Proceedings of the 49th …, 2022 - dl.acm.org

… model using Tensor Core GPU. We set the configurations of V100 GPU according to the …
We use 2D convolution layers from ResNet-18 and compare the ground-truth performance …

Запазване Позоваване С позовавания в 50 Сродни статии Всички 3 версии

[PDF] arxiv.org

Soft masking for cost-constrained channel pruning

R Humble, M Shen, JA Latorre, E Darve… - European Conference on …, 2022 - Springer

… We use a latency cost constraint, defined by a layer-wise … speed on a NVIDIA TITAN V GPU
with cudNN V7.6.5 [4]. … pruning the early convolution layers and leaves the later layers better …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 5 версии

[PDF] thecvf.com

Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution

A Ofir, G Ben-Artzi - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

… We compare SMM-Conv to im2col and MEC with different convolutional layer parameters.
The … Yololite: a real-time object detection algorithm optimized for non-gpu computers. In 2018 …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 4 версии Във вид на HTML

[PDF] acm.org

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org

… the GPU and TPU which is important for unpruned, dense models that continue to be used.
Any unstructured sparse operation … only in convolutional neural networks (CNNs). Recurrent …

Запазване Позоваване С позовавания в 5 Сродни статии Всички 3 версии

[PDF] springer.com

POAS: a framework for exploiting accelerator level parallelism in heterogeneous environments

PA Martínez, G Bernabé, JM García - The Journal of Supercomputing, 2024 - Springer

… , convolution, the heart of convolutional neural networks (CNNs… differences between GEMM
and convolution, since most of … During this evaluation, we refer to an XPU as a GPU that uses …

Запазване Позоваване Сродни статии

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Optimizing GPU Convnets

[PDF][PDF] Heterogeneous system-on-chip for AI computing

Exploring Optimized CPU-Inference for Latency-Critical Machine Learning Tasks

oneAPI open-source math library interface

Pure tensor program rewriting via access patterns (representation pearl)

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

Soft masking for cost-constrained channel pruning

Smm-conv: Scalar matrix multiplication with zero packing for accelerated convolution

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

POAS: a framework for exploiting accelerator level parallelism in heterogeneous environments