Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2019
Accelerating All-Edge Common Neighbor Counting on Three Processors
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingArticle No.: 42, Pages 1–10https://doi.org/10.1145/3337821.3337917We propose to accelerate an important but time-consuming operation in online graph analytics, which is the counting of common neighbors for each pair of adjacent vertices (u,v), or edge (u,v), on three modern processors of different architectures. We ...
- research-articleJune 2019
Porting the COSMO Weather Model to Manycore CPUs
- Felix Thaler,
- Stefan Moosbrugger,
- Carlos Osuna,
- Mauro Bianco,
- Hannes Vogt,
- Anton Afanasyev,
- Lukas Mosimann,
- Oliver Fuhrer,
- Thomas C. Schulthess,
- Torsten Hoefler
PASC '19: Proceedings of the Platform for Advanced Scientific Computing ConferenceArticle No.: 13, Pages 1–11https://doi.org/10.1145/3324989.3325723Weather and climate simulations are a major application driver in high-performance computing (HPC). With the end of Dennard scaling and Moore's law, the HPC industry increasingly employs specialized computation accelerators to increase computational ...
- ArticleJune 2019
Evaluation of the Suitability of Intel Xeon Phi Clusters for the Simulation of Ultrasound Wave Propagation Using Pseudospectral Methods
AbstractThe ability to perform large-scale ultrasound simulations using Fourier pseudospectral methods has generated significant interest in medical ultrasonics, including for treatment planning in therapeutic ultrasound and image reconstruction in ...
- research-articleMarch 2019
Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors
- Daniel S Abdi,
- Francis X Giraldo,
- Emil M Constantinescu,
- Lester E Carr,
- Lucas C Wilcox,
- Timothy C Warburton
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 2Pages 242–267https://doi.org/10.1177/1094342017732395We present the acceleration of an IMplicit–EXplicit (IMEX) nonhydrostatic atmospheric model on manycore processors such as graphic processing units (GPUs) and Intel’s Many Integrated Core (MIC) architecture. IMEX time integration methods sidestep the ...
- research-articleJanuary 2019
Performance evaluation of main-memory hash joins on KNL
International Journal of Computational Science and Engineering (IJCSE), Volume 20, Issue 4Pages 425–438https://doi.org/10.1504/ijcse.2019.104443New hardware features have propelled designs and analysis in main-memory hash joins. In previous studies, memory access has always been the primary bottleneck for hash join algorithms. However, there are relatively few studies devoted to bottlenecks ...
- extended-abstractOctober 2018
Achieving transparency mapping parallel applications: a memory hierarchy affair
MEMSYS '18: Proceedings of the International Symposium on Memory SystemsPages 185–189https://doi.org/10.1145/3240302.3240316Computer systems are becoming increasingly complex: they provide the expected compute capability at the cost of deeper memory hierarchies (high-bandwidth and high-capacity memories), heterogeneous compute elements (latency-optimized and throughput-...
- research-articleSeptember 2018
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingArticle No.: 4, Pages 1–10https://doi.org/10.1145/3236367.3236371Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as ...
- ArticleDecember 2018
NUMAPROF, A NUMA Memory Profiler
Euro-Par 2018: Parallel Processing WorkshopsPages 159–170https://doi.org/10.1007/978-3-030-10549-5_13AbstractThe number of cores in HPC systems and servers increased a lot for the last few years. In order to also increase the available memory bandwidth and capacity, most systems became NUMA (Non-Uniform Memory Access) meaning each processor has its own ...
- ArticleAugust 2018
Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors
AbstractConvolutional Neural Networks (CNNs) represent a class of Deep Neural Networks that is growing in importance due to their state-of-the-art performance in pattern recognition tasks in various domains, including image recognition, speech recognition,...
- research-articleJune 2018
Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory
DAMON '18: Proceedings of the 14th International Workshop on Data Management on New HardwareArticle No.: 8, Pages 1–10https://doi.org/10.1145/3211922.3211929With High-Bandwidth Memory (HBM), an additional opportunity on hardware side for performance benefits is given. The large amount of available bandwidth compared to regular DRAM allows the execution of high numbers of threads in parallel masking ...
- articleJune 2018
A vectorized k-means algorithm for compressed datasets: design and experimental analysis
The Journal of Supercomputing (JSCO), Volume 74, Issue 6Pages 2705–2728https://doi.org/10.1007/s11227-018-2310-0Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other ...
- research-articleDecember 2017
Implementing Genetic Algorithm Accelerated By Intel Xeon Phi
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyPages 249–254https://doi.org/10.1145/3155133.3155176In this paper, genetic algorithm (GA) accelerated by Intel Xeon Phi coprocessor based on Intel Many Integrated Chip (MIC) Architecture is proposed and called GAPhi framework. The GAPhi framework solves the power-aware task scheduling (PATS) problems in ...
- research-articleNovember 2017
Beyond 16GB: Out-of-Core Stencil Computations
MCHPC'17: Proceedings of the Workshop on Memory Centric Programming for HPCPages 20–29https://doi.org/10.1145/3145617.3145619Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such ...
- ArticleJune 2017
Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor
AbstractIn this work we present AWP-ODC-OS, an end-to-end optimization of AWP-ODC targeting homogeneous, manycore supercomputers. AWP-ODC is an established community software package simulating seismic wave propagation using a staggered finite difference ...
- short-paperMay 2017
Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture
CF'17: Proceedings of the Computing Frontiers ConferencePages 247–250https://doi.org/10.1145/3075564.3075593Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive ...
- research-articleNovember 2016
Towards automatic HBM allocation using LLVM: a case study with knights landing
In this paper, we introduce a new LLVM analysis, called Bandwidth-Critical Data Analysis (BCDA), to decide when it is beneficial to allocate data in High-Bandwidth Memory (HBM) and then transform allocation calls into specific HBM allocation calls, for ...
- research-articleNovember 2015
Simulating stencil-based application on future Xeon Phi processor
PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing SystemsArticle No.: 7, Pages 1–10https://doi.org/10.1145/2832087.2832096An important application for hydrocarbon exploration is simulated on a performance model of a novel Intel architecture. The accuracy of the simulation models is demonstrated by correlating against an existing processor first and then against high-...