Abstract
SYCL[1] is a royalty-free, cross-platform abstraction C++ programming model for heterogeneous computing. SYCL provides necessary programming interfaces like device, queue, kernel, memory interface including buffer, accessor as well as features like USM. As a programing model for heterogeneous computing, Intel oneAPI[2] provides a SYCL compiler and runtime to support SYCL kernel-based programing and set of optimized libraries to support API-based programming.
SYCLomatic[3] is a project to assist developers in migrating their existing code written in different programming languages to the SYCL C++ heterogeneous programming model. SYCLomatic supports source-to-source migration from existing CUDA application source code to SYCL source code by leveraging SYCL interfaces and the optimized libraries provided by Intel oneAPI. One of the major challenges of SYCLomatic is that, in some cases, due to differences in API, expressing the identical semantic of a single line of CUDA code in SYCL requires additional data structures or multiple lines of operations. To assist the migration and make the migrated code performant and maintainable, SYCLomatic implements a compatibility library, which consists of additions to SYCL interfaces and a set of compatible APIs for popular libraries. Without the dependency to SYCLomatic, the compatibility library can be used as a standalone library for SYCL programming.
In this talk, we are going to share the reason of creating the compatibility library and the design of the compatibility library.
Addressing Semantic Differences:
The first part of the compatibility library is to address the semantic differences with CUDA code by adding new functionality to SYCL interfaces like device, queue, malloc, image accessor, etc. by introducing new classes. (1) Utility features to access queues in different devices and threads: Keeping and passing around the sycl::device pointer between host functions is tedious. In the compatibility library, a singleton device manager class is introduced and used to track the usage of each device in different CPU threads.
With the device manager class, it is easy to achieve following features:
(a) Get the “current” device in a thread:
The class keeps a map between threads and the last used device in the thread. The map makes it easier to access the wanted device in a host function.
(b) Get the default queue for a device:
When offloading a task to a device, SYCL requires developer to create a new queue on the device if the pointer of previous created queue is not available. The class keeps a default queue for each device which will be available globally. When a developer needs to use the queue on a device, the class provides a convenient interface to get the default queue of the device.
(c) Device level operation (create queue, synchronize, reset):
The class records all the creation of queues and maps the queues to the devices. Therefore, device level synchronization can be achieved easily.
(2) Pointer-like memory operation for non-USM mode:
Since managing memory through pointer operations may be more convenient for some cases, emulating pointer operations with sycl::buffer provides pointer-like memory operations including malloc, free, arithmetic, etc. for the devices which do not support USM.
(3) Flexible interface to fetch Image data:
The compatibility library introduces a class which simplifies the operation of fetching image data, e.g., extracting 1 or 2 channels from the image accessor.
Compatible APIs:
The second part of the compatibility library is to provide syntactic sugar for frequently used API calls.
(1) Free functions for atomic operation:
With sycl::atomic_ref, performing an atomic operation requires following 2 steps:
(a) Construction of sycl::atomic_ref
(b) Executing the atomic operation on the sycl::atomic_ref
The compatibility library introduces a set of templated atomic calls to help developers simplify their code.
(2) Utility Classes to simplify device memory allocation:
Since sycl::malloc cannot be used to allocate a multi-dimension array and requires multiple steps to create a device-accessible static or global variable, a device memory class performs memory allocation and keeps the dimension information, also providing the following features:
(a) Simple interface to allocate a multi-dimension array and pass it to device
(b) Simple interface to create a static or global variable which can be accessed in device
(3) 2D and 3D Memory Operations (USM, non-USM):
The compatibility library provides free functions for 2D and 3D memory operations like allocation, memory copy, memory set, etc. to save efforts for developers.
(4) Compatible APIs for popular CUDA libraries:
Libraries like BLAS (Basic Linear Algebra Subprograms), CCL (Collective Communication Library), DNN (Deep Neural Network Library), STL algorithm, FFT (Fast Fourier transform), etc. are widely used in heterogeneous applications. While Intel oneAPI package provides the libraries with SYCL interfaces, there is some difference in the API design for libraries from different implementation which provide similar core functionality. The compatibility library contains APIs to bridge the usage difference and let developers implement SYCL applications with the interface they are more familiar with.
Since SYCL is a relatively young language specification, many existing heterogeneous computing applications, libraries, and frameworks may not have a SYCL interface supported. With the compatibility library addressing some of the syntax/semantic differences between SYCL and other heterogeneous computing languages, developers should be able to create SYCL-based libraries/framework with less effort.
To improve the functionality and useability of the compatibility library, there is still work to do, like making the compatibility library to co-exist with SYCL-implemented components in the aspect of device selection, queue activation, task synchronization, etc. and addressing interface differences with more APIs from popular CUDA libraries.
Notices & Disclaimers:
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. SYCL is a registered trademark of the Khronos Group, Inc.