Google Наука

PolyDL: Polyhedral optimizations for creation of high-performance dl primitives

S Tavarageri, A Heinecke, S Avancha, B Kaul… - … on Architecture and …, 2021 - dl.acm.org

… Deep Neural Networks (DNNs… oneDNN library and with AutoTVM. The experiments show
that we are able to match the performance of expert coded DL primitives in the oneDNN library …

Запазване Позоваване С позовавания в 25 Сродни статии Всички 3 версии

[PDF] arxiv.org

Optimizing inference performance of transformers on CPUs

D Dice, A Kogan - arXiv preprint arXiv:2102.06621, 2021 - arxiv.org

… The introduction of the Transfomer architecture for deep neural networks (… onednn base
onednn normal onednn almo onednn base onednn normal onednn almo onednn base onednn …

Запазване Позоваване С позовавания в 18 Сродни статии Всички 4 версии Във вид на HTML

[PDF] vanderbilt.edu

Beyond Audio Quality: Understanding and Improving Voice Communication With Low-Resource Deep Learning

Q Fu - 2023 - search.proquest.com

… complexity of Convolutional Neural Network (CNN) models [35]. Time-to-Train (TTT) is a
widely adopted metric for measuring the training performance of deep learning models, which is …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 2 версии

[PDF] archive.org

A graph neural network-based performance model for deep learning applications

S Singh, J Hegarty, H Leather, B Steiner - Proceedings of the 6th ACM …, 2022 - dl.acm.org

… utilizing the inherent graph structure of deep-learning networks. Specifically, we employ …
neural networks to estimate the performance of deep-learning pipelines in the Halide framework…

Запазване Позоваване С позовавания в 5 Сродни статии Всички 2 версии

[PDF] ieee.org

Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training

S Zheng, R Chen, Y Jin, A Wei, B Wu… - … on Parallel and …, 2021 - ieeexplore.ieee.org

… Abstract—Deep neural networks (DNNs) are increasingly … on hand-optimized libraries
to provide efficient implementations … One DNN graph can contain tens or hundreds of operators. …

Запазване Позоваване С позовавания в 11 Сродни статии Всички 2 версии

[PDF] vldb.org

Parax: Boosting deep learning for big data analytics on many-core cpus

L Yin, Y Zhang, Z Zhang, Y Peng, P Zhao - Proceedings of the VLDB …, 2021 - dl.acm.org

… For x86-based CPU architectures, the math kernel library for Deep Neural Networks (MKL-DNN
aka oneDNN [28]) has developed a series of optimizations for specific operations (like …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 6 версии

[PDF] arxiv.org

Chunkattention: Efficient self-attention with prefix-aware kv cache and two-phase partition

L Ye, Z Tao, Y Huang, Y Li - arXiv preprint arXiv:2402.15220, 2024 - arxiv.org

Self-attention is an essential component of large language models(LLMs) but a significant
source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 2 версии Във вид на HTML

[PDF] arxiv.org

Accelerating bandwidth-bound deep learning inference with main-memory accelerators

BY Cho, J Jung, M Erez - … Computing, Networking, Storage and Analysis, 2021 - dl.acm.org

… While we use the highlyoptimized Intel OneDNN library on the CPU, the performance we …
Floatpim: In-memory acceleration of deep neural network training with high precision. In …

Запазване Позоваване С позовавания в 26 Сродни статии Всички 5 версии

[PDF] acm.org

HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

Z Zhang, B He, Z Zhang - … of the 51st International Conference on …, 2022 - dl.acm.org

… Deep neural networks (DNNs) with high performance … of rapidly evolving neural networks
and hardware platforms, … -provided libraries like oneDNN [3] and cuDNN [9] for neural models. …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 3 версии

[PDF] acm.org Full View

Fast convolution meets low precision: Exploring efficient quantized Winograd convolution on modern CPUs

X Wang, G Li, Z Jia, X Feng, Y Wang - … Transactions on Architecture and …, 2024 - dl.acm.org

… by leveraging representative convolutional layers of prevailing neural networks on Intel …
In addition to comparing LoWino with the implementations of the Intel oneDNN library, we also …

Запазване Позоваване Сродни статии Всички 3 версии

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

PolyDL: Polyhedral optimizations for creation of high-performance dl primitives

Optimizing inference performance of transformers on CPUs

Beyond Audio Quality: Understanding and Improving Voice Communication With Low-Resource Deep Learning

A graph neural network-based performance model for deep learning applications

Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training

Parax: Boosting deep learning for big data analytics on many-core cpus

Chunkattention: Efficient self-attention with prefix-aware kv cache and two-phase partition

Accelerating bandwidth-bound deep learning inference with main-memory accelerators

HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

Fast convolution meets low precision: Exploring efficient quantized Winograd convolution on modern CPUs