Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs
ICPP '24: Proceedings of the 53rd International Conference on Parallel ProcessingAugust 2024, Pages 576–586https://doi.org/10.1145/3673038.3673046The end-to-end performance of deep learning model inference is often limited by excess data movement on GPUs. To reduce data movement, existing deep learning frameworks apply graph-level optimizations such as operator fusion to exploit data reuse across ...
- research-articleJune 2024
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingMay 2024, Pages 511–524https://doi.org/10.1145/3650200.3656593This paper introduces FASTEN, a cutting-edge library developed to address the computational challenges inherent in Heterogeneous Graph Neural Networks (HGNNs). The key focus of FASTEN is the optimization of segmented matrix multiplication, a critical ...
- research-articleOctober 2023
Enabling Multi-tenancy on SSDs with Accurate IO Interference Modeling
SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingOctober 2023, Pages 216–232https://doi.org/10.1145/3620678.3624657Technological advancements in the past decades have substantially increased the capacity and performance of Solid State Drives (SSDs). Provisioning such high-capacity SSDs among tenants can reap multiple benefits, such as elevated performance, efficient ...
- research-articleOctober 2023
Parrotfish: Parametric Regression for Optimizing Serverless Functions
SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingOctober 2023, Pages 177–192https://doi.org/10.1145/3620678.3624654Serverless computing is a new paradigm that aims to remove the burdens of cloud management from developers. Yet rightsizing serverless functions remains a pain point for developers. Choosing the right memory configuration is necessary to ensure cost and/...
- ArticleAugust 2023
Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling
- Jean-Baptiste Besnard,
- Ahmad Tarraf,
- Clément Barthélemy,
- Alberto Cascajo,
- Emmanuel Jeannot,
- Sameer Shende,
- Felix Wolf
AbstractHigh-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. ...
-
- research-articleJanuary 2023
Study of Data and Model parallelism in Distributed Deep learning for Diabetic retinopathy classification
Procedia Computer Science (PROCS), Volume 218, Issue C2023, Pages 2253–2263https://doi.org/10.1016/j.procs.2023.01.201AbstractDistributed deep learning(DDL) is an area of research in Artificial intelligence (AI), where the training time of deep learning(DL) models can be drastically reduced by using multiple accelerators. Most of the previous research works use the data ...
- ArticleJune 2022
From Graphs to the Science Computer of a Space Telescope: The Power of Petri Nets in Systems Engineering
- Rafal Graczyk,
- Waldemar Bujwan,
- Marcin Darmetko,
- Marcin Dziezyc,
- Damien Galano,
- Konrad Grochowski,
- Michal Kurowski,
- Grzegorz Juchnikowski,
- Marek Morawski,
- Michal Mosdorf,
- Piotr Orleanski,
- Cedric Thizy,
- Marcus Völp
Application and Theory of Petri Nets and ConcurrencyJun 2022, Pages 153–174https://doi.org/10.1007/978-3-031-06653-5_9AbstractSpace system engineering has to follow a rigorous design process to manage performance/risk trade-offs at each development stage and possibly across several functional and organizational domains. The process is further complicated by the co-...
- research-articleJune 2022
A graph neural network-based performance model for deep learning applications
MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine ProgrammingJune 2022, Pages 11–20https://doi.org/10.1145/3520312.3534863The unprecedented proliferation of machine learning based software brings an ever-increasing need to optimize the implementation of such applications. State-of-the-art compilers for neural networks, such as Halide and TVM, incorporate a machine learning-...
- research-articleFebruary 2022
PPT-Multicore: performance prediction of OpenMP applications using reuse profiles and analytical modeling
- Atanu Barai,
- Yehia Arafa,
- Abdel-Hameed Badawy,
- Gopinath Chennupati,
- Nandakishore Santhi,
- Stephan Eidenbenz
The Journal of Supercomputing (JSCO), Volume 78, Issue 2Feb 2022, Pages 2354–2385https://doi.org/10.1007/s11227-021-03949-4AbstractWe present PPT-Multicore, an analytical model embedded in the Performance Prediction Toolkit (PPT) to predict parallel applications’ performance running on a multicore processor. PPT-Multicore builds upon our previous work towards a multicore ...
- short-paperDecember 2021
Is Function-as-a-Service a Good Fit for Latency-Critical Services?
- Haoran Qiu,
- Saurabh Jha,
- Subho S. Banerjee,
- Archit Patke,
- Chen Wang,
- Franke Hubertus,
- Zbigniew T. Kalbarczyk,
- Ravishankar K. Iyer
WoSC '21: Proceedings of the Seventh International Workshop on Serverless Computing (WoSC7) 2021December 2021, Pages 1–8https://doi.org/10.1145/3493651.3493666Function-as-a-Service (FaaS) is becoming an increasingly popular cloud-deployment paradigm for serverless computing that frees application developers from managing the infrastructure. At the same time, it allows cloud providers to assert control in ...
- research-articleOctober 2020
Modeling User-Centered Page Load Time for Smartphones
MobileHCI '20: 22nd International Conference on Human-Computer Interaction with Mobile Devices and ServicesOctober 2020, Article No.: 1, Pages 1–12https://doi.org/10.1145/3379503.3403565Page Load Time (PLT) is critical in measuring web page load performance. However, the existing PLT metrics are designed to measure the Web page load performance on desktops/laptops and do not consider user interactions on mobile browsers. As a result, ...
- articleOctober 2020
Modeling Students' Performances in Activity-Based E-Learning From a Learning Analytics Perspective: Implications and Relevance for Learning Design
International Journal of Distance Education Technologies (IJDET-IGI), Volume 18, Issue 4Oct 2020, Pages 71–93https://doi.org/10.4018/IJDET.2020100105This paper reports the findings of a research using marks of students in learning activities of an online module to build a predictive model of performance for the final assessment of the module. The objectives were (1) to compare the performances of ...
- research-articleMarch 2021
Performance Modeling and Evaluation of a Production Disaggregated Memory System
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsSeptember 2020, Pages 223–232https://doi.org/10.1145/3422575.3422795High performance computers rely on large memories to cache data and improve performance. However, managing the ever-increasing number of levels in the memory hierarchy becomes increasingly difficult. The Disaggregated Memory System (DMS) architecture ...
- research-articleAugust 2020
Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-stepping
ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel ProcessingAugust 2020, Article No.: 8, Pages 1–10https://doi.org/10.1145/3409390.3409398Network contention has long been one of the root causes of performance loss in large-scale parallel applications. With the increasing importance of performance modeling to both large-scale application optimization and application-system co-design, the ...
- research-articleFebruary 2020
Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+
International Journal of Parallel Programming (IJPP), Volume 48, Issue 1Feb 2020, Pages 80–97https://doi.org/10.1007/s10766-019-00646-xAbstractUnderstanding the scalability of parallel programs is crucial for software optimization and hardware architecture design. As HPC hardware is moving towards many-core design, it becomes increasingly difficult for a parallel program to make ...
- research-articleSeptember 2018
Execution Time Prediction for Apache Spark
ICCBD '18: Proceedings of the 2018 International Conference on Computing and Big DataSeptember 2018, Pages 47–51https://doi.org/10.1145/3277104.3277109Apache Spark is a framework that being increasingly used in distributed data processing. However, the performance of a Spark application can vary considerably depending on many factors, including the input data, implementation of the program, Spark ...
- abstractDecember 2017
Performance Evaluation for Service Function Chains through Automated Model Building
VALUETOOLS 2017: Proceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and ToolsDecember 2017, Pages 257–258https://doi.org/10.1145/3150928.3150967The advance of Software-defined Networking and Network Function Virtualization leads to highly flexible networks. The dynamic instantiation and modification of these networks depends on embedding algorithms. Well-known performance modeling tools could ...
- research-articleMarch 2017
Towards a Stochastic Model for Integrated detection and filtering of DoS attacks in Cloud environments
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and ApplicationsMarch 2017, Article No.: 28, Pages 1–6https://doi.org/10.1145/3090354.3090383Cloud Data Center (CDC) security remains a major challenge for business organizations and takes an important concern with research works. The attacker purpose is to guarantee the service unavailability and maximize the financial loss costs. As a result, ...
- research-articleJanuary 2017
Evaluation and Performance Modeling of a Burst Buffer Solution
ACM SIGOPS Operating Systems Review (SIGOPS), Volume 50, Issue 2December 2016, Pages 12–26https://doi.org/10.1145/3041710.3041714Hierarchical storage architectures are required to meet both, capacity and bandwidth requirements for future high-end storage architectures. In this paper we present the results of an evaluation of an emerging technology, DataDirect Networks' (DDN) ...
- research-articleFebruary 2016
Model Driven Software Performance Engineering: Current Challenges and Way Ahead
ACM SIGMETRICS Performance Evaluation Review (SIGMETRICS), Volume 43, Issue 4March 2016, Pages 53–62https://doi.org/10.1145/2897356.2897363Performance model solvers and simulation engines have been around for more than two decades. Yet, performance modeling has not received wide acceptance in the software industry, unlike pervasion of modeling and simulation tools in other industries. This ...