PolyDL: Polyhedral optimizations for creation of high-performance dl primitives

S Tavarageri, A Heinecke, S Avancha, B Kaul… - … on Architecture and …, 2021 - dl.acm.org
Deep Neural Networks (DNNs… oneDNN library and with AutoTVM. The experiments show
that we are able to match the performance of expert coded DL primitives in the oneDNN library

Optimizing inference performance of transformers on CPUs

D Dice, A Kogan - arXiv preprint arXiv:2102.06621, 2021 - arxiv.org
… The introduction of the Transfomer architecture for deep neural networks (… onednn base
onednn normal onednn almo onednn base onednn normal onednn almo onednn base onednn

Beyond Audio Quality: Understanding and Improving Voice Communication With Low-Resource Deep Learning

Q Fu - 2023 - search.proquest.com
… complexity of Convolutional Neural Network (CNN) models [35]. Time-to-Train (TTT) is a
widely adopted metric for measuring the training performance of deep learning models, which is …

A graph neural network-based performance model for deep learning applications

S Singh, J Hegarty, H Leather, B Steiner - Proceedings of the 6th ACM …, 2022 - dl.acm.org
… utilizing the inherent graph structure of deep-learning networks. Specifically, we employ …
neural networks to estimate the performance of deep-learning pipelines in the Halide framework

Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training

S Zheng, R Chen, Y Jin, A Wei, B Wu… - … on Parallel and …, 2021 - ieeexplore.ieee.org
… Abstract—Deep neural networks (DNNs) are increasingly … on hand-optimized libraries
to provide efficient implementations … One DNN graph can contain tens or hundreds of operators. …

Parax: Boosting deep learning for big data analytics on many-core cpus

L Yin, Y Zhang, Z Zhang, Y Peng, P Zhao - Proceedings of the VLDB …, 2021 - dl.acm.org
… For x86-based CPU architectures, the math kernel library for Deep Neural Networks (MKL-DNN
aka oneDNN [28]) has developed a series of optimizations for specific operations (like …

Chunkattention: Efficient self-attention with prefix-aware kv cache and two-phase partition

L Ye, Z Tao, Y Huang, Y Li - arXiv preprint arXiv:2402.15220, 2024 - arxiv.org
Self-attention is an essential component of large language models(LLMs) but a significant
source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the …

Accelerating bandwidth-bound deep learning inference with main-memory accelerators

BY Cho, J Jung, M Erez - … Computing, Networking, Storage and Analysis, 2021 - dl.acm.org
… While we use the highlyoptimized Intel OneDNN library on the CPU, the performance we …
Floatpim: In-memory acceleration of deep neural network training with high precision. In …

HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

Z Zhang, B He, Z Zhang - … of the 51st International Conference on …, 2022 - dl.acm.org
Deep neural networks (DNNs) with high performance … of rapidly evolving neural networks
and hardware platforms, … -provided libraries like oneDNN [3] and cuDNN [9] for neural models. …

Fast convolution meets low precision: Exploring efficient quantized Winograd convolution on modern CPUs

X Wang, G Li, Z Jia, X Feng, Y Wang - … Transactions on Architecture and …, 2024 - dl.acm.org
… by leveraging representative convolutional layers of prevailing neural networks on Intel …
In addition to comparing LoWino with the implementations of the Intel oneDNN library, we also …