Keyword: KNL : Search

research-article

Accelerating All-Edge Common Neighbor Counting on Three Processors

ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingArticle No.: 42, Pages 1–10https://doi.org/10.1145/3337821.3337917

We propose to accelerate an important but time-consuming operation in online graph analytics, which is the counting of common neighbors for each pair of adjacent vertices (u,v), or edge (u,v), on three modern processors of different architectures. We ...

research-article

Porting the COSMO Weather Model to Manycore CPUs

PASC '19: Proceedings of the Platform for Advanced Scientific Computing ConferenceArticle No.: 13, Pages 1–11https://doi.org/10.1145/3324989.3325723

Weather and climate simulations are a major application driver in high-performance computing (HPC). With the end of Dennard scaling and Moore's law, the HPC industry increasingly employs specialized computation accelerators to increase computational ...

Article

Evaluation of the Suitability of Intel Xeon Phi Clusters for the Simulation of Ultrasound Wave Propagation Using Pseudospectral Methods

Computational Science – ICCS 2019Pages 577–590https://doi.org/10.1007/978-3-030-22744-9_45

Abstract

The ability to perform large-scale ultrasound simulations using Fourier pseudospectral methods has generated significant interest in medical ultrasonics, including for treatment planning in therapeutic ultrasound and image reconstruction in ...

research-article

Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 2Pages 242–267https://doi.org/10.1177/1094342017732395

We present the acceleration of an IMplicit–EXplicit (IMEX) nonhydrostatic atmospheric model on manycore processors such as graphic processing units (GPUs) and Intel’s Many Integrated Core (MIC) architecture. IMEX time integration methods sidestep the ...

research-article

Performance evaluation of main-memory hash joins on KNL

International Journal of Computational Science and Engineering (IJCSE), Volume 20, Issue 4Pages 425–438https://doi.org/10.1504/ijcse.2019.104443

New hardware features have propelled designs and analysis in main-memory hash joins. In previous studies, memory access has always been the primary bottleneck for hash join algorithms. However, there are relatively few studies devoted to bottlenecks ...

extended-abstract

Achieving transparency mapping parallel applications: a memory hierarchy affair

MEMSYS '18: Proceedings of the International Symposium on Memory SystemsPages 185–189https://doi.org/10.1145/3240302.3240316

Computer systems are becoming increasingly complex: they provide the expected compute capability at the cost of deeper memory hierarchies (high-bandwidth and high-capacity memories), heterogeneous compute elements (latency-optimized and throughput-...

research-article

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures

EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingArticle No.: 4, Pages 1–10https://doi.org/10.1145/3236367.3236371

Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as ...

Article

NUMAPROF, A NUMA Memory Profiler

Euro-Par 2018: Parallel Processing WorkshopsPages 159–170https://doi.org/10.1007/978-3-030-10549-5_13

Abstract

The number of cores in HPC systems and servers increased a lot for the last few years. In order to also increase the available memory bandwidth and capacity, most systems became NUMA (Non-Uniform Memory Access) meaning each processor has its own ...

Article

Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors

Euro-Par 2018: Parallel ProcessingPages 265–278https://doi.org/10.1007/978-3-319-96983-1_19

Abstract

Convolutional Neural Networks (CNNs) represent a class of Deep Neural Networks that is growing in importance due to their state-of-the-art performance in pattern recognition tasks in various domains, including image recognition, speech recognition,...

research-article

Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory

DAMON '18: Proceedings of the 14th International Workshop on Data Management on New HardwareArticle No.: 8, Pages 1–10https://doi.org/10.1145/3211922.3211929

With High-Bandwidth Memory (HBM), an additional opportunity on hardware side for performance benefits is given. The large amount of available bandwidth compared to regular DRAM allows the execution of high numbers of threads in parallel masking ...

article

A vectorized k-means algorithm for compressed datasets: design and experimental analysis

The Journal of Supercomputing (JSCO), Volume 74, Issue 6Pages 2705–2728https://doi.org/10.1007/s11227-018-2310-0

Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other ...

research-article

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi

SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyPages 249–254https://doi.org/10.1145/3155133.3155176

In this paper, genetic algorithm (GA) accelerated by Intel Xeon Phi coprocessor based on Intel Many Integrated Chip (MIC) Architecture is proposed and called GAPhi framework. The GAPhi framework solves the power-aware task scheduling (PATS) problems in ...

research-article

Beyond 16GB: Out-of-Core Stencil Computations

MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPCPages 20–29https://doi.org/10.1145/3145617.3145619

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such ...

Article

Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor

High Performance ComputingPages 139–157https://doi.org/10.1007/978-3-319-58667-0_8

Abstract

In this work we present AWP-ODC-OS, an end-to-end optimization of AWP-ODC targeting homogeneous, manycore supercomputers. AWP-ODC is an established community software package simulating seismic wave propagation using a staggered finite difference ...

short-paper

Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture

CF'17: Proceedings of the Computing Frontiers ConferencePages 247–250https://doi.org/10.1145/3075564.3075593

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive ...

research-article

Towards automatic HBM allocation using LLVM: a case study with knights landing

LLVM-HPC '16: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPCPages 12–20

In this paper, we introduce a new LLVM analysis, called Bandwidth-Critical Data Analysis (BCDA), to decide when it is beneficial to allocate data in High-Bandwidth Memory (HBM) and then transform allocation calls into specific HBM allocation calls, for ...

research-article

Simulating stencil-based application on future Xeon Phi processor

PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing SystemsArticle No.: 7, Pages 1–10https://doi.org/10.1145/2832087.2832096

An important application for hydrocarbon exploration is simulated on a performance model of a novel Intel architecture. The accuracy of the simulation models is demonstrated by correlating against an existing processor first and then against high-...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Accelerating All-Edge Common Neighbor Counting on Three Processors

Porting the COSMO Weather Model to Manycore CPUs

Evaluation of the Suitability of Intel Xeon Phi Clusters for the Simulation of Ultrasound Wave Propagation Using Pseudospectral Methods

Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors

Performance evaluation of main-memory hash joins on KNL

Achieving transparency mapping parallel applications: a memory hierarchy affair

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures

NUMAPROF, A NUMA Memory Profiler

Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors

Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory

A vectorized k-means algorithm for compressed datasets: design and experimental analysis

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi

Beyond 16GB: Out-of-Core Stencil Computations

Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor

Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture

Towards automatic HBM allocation using LLVM: a case study with knights landing

Simulating stencil-based application on future Xeon Phi processor

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder