PolyDL: Polyhedral optimizations for creation of high-performance dl primitives
… Deep Neural Networks (DNNs… oneDNN library and with AutoTVM. The experiments show
that we are able to match the performance of expert coded DL primitives in the oneDNN library …
that we are able to match the performance of expert coded DL primitives in the oneDNN library …
Optimizing inference performance of transformers on CPUs
… The introduction of the Transfomer architecture for deep neural networks (… onednn base
onednn normal onednn almo onednn base onednn normal onednn almo onednn base onednn …
onednn normal onednn almo onednn base onednn normal onednn almo onednn base onednn …
Beyond Audio Quality: Understanding and Improving Voice Communication With Low-Resource Deep Learning
Q Fu - 2023 - search.proquest.com
… complexity of Convolutional Neural Network (CNN) models [35]. Time-to-Train (TTT) is a
widely adopted metric for measuring the training performance of deep learning models, which is …
widely adopted metric for measuring the training performance of deep learning models, which is …
A graph neural network-based performance model for deep learning applications
… utilizing the inherent graph structure of deep-learning networks. Specifically, we employ …
neural networks to estimate the performance of deep-learning pipelines in the Halide framework…
neural networks to estimate the performance of deep-learning pipelines in the Halide framework…
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training
… Abstract—Deep neural networks (DNNs) are increasingly … on hand-optimized libraries
to provide efficient implementations … One DNN graph can contain tens or hundreds of operators. …
to provide efficient implementations … One DNN graph can contain tens or hundreds of operators. …
Parax: Boosting deep learning for big data analytics on many-core cpus
… For x86-based CPU architectures, the math kernel library for Deep Neural Networks (MKL-DNN
aka oneDNN [28]) has developed a series of optimizations for specific operations (like …
aka oneDNN [28]) has developed a series of optimizations for specific operations (like …
Chunkattention: Efficient self-attention with prefix-aware kv cache and two-phase partition
L Ye, Z Tao, Y Huang, Y Li - arXiv preprint arXiv:2402.15220, 2024 - arxiv.org
Self-attention is an essential component of large language models(LLMs) but a significant
source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the …
source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the …
Accelerating bandwidth-bound deep learning inference with main-memory accelerators
… While we use the highlyoptimized Intel OneDNN library on the CPU, the performance we …
Floatpim: In-memory acceleration of deep neural network training with high precision. In …
Floatpim: In-memory acceleration of deep neural network training with high precision. In …
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks
… Deep neural networks (DNNs) with high performance … of rapidly evolving neural networks
and hardware platforms, … -provided libraries like oneDNN [3] and cuDNN [9] for neural models. …
and hardware platforms, … -provided libraries like oneDNN [3] and cuDNN [9] for neural models. …
Fast convolution meets low precision: Exploring efficient quantized Winograd convolution on modern CPUs
… by leveraging representative convolutional layers of prevailing neural networks on Intel …
In addition to comparing LoWino with the implementations of the Intel oneDNN library, we also …
In addition to comparing LoWino with the implementations of the Intel oneDNN library, we also …