Loop transformations leveraging hardware prefetching

S Sioutas, S Stuijk, H Corporaal, T Basten… - Proceedings of the 2018 …, 2018 - dl.acm.org
S Sioutas, S Stuijk, H Corporaal, T Basten, L Somers
Proceedings of the 2018 International Symposium on Code Generation and …, 2018dl.acm.org
Memory-bound applications heavily depend on the bandwidth of the system in order to
achieve high performance. Improving temporal and/or spatial locality through loop
transformations is a common way of mitigating this dependency. However, choosing the
right combination of optimizations is not a trivial task, due to the fact that most of them alter
the memory access pattern of the application and as a result interfere with the efficiency of
the hardware prefetching mechanisms present in modern architectures. We propose an …
Memory-bound applications heavily depend on the bandwidth of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the right combination of optimizations is not a trivial task, due to the fact that most of them alter the memory access pattern of the application and as a result interfere with the efficiency of the hardware prefetching mechanisms present in modern architectures. We propose an optimization algorithm that analytically classifies an algorithmic description of a loop nest in order to decide whether it should be optimized stressing its temporal or spatial locality, while also taking hardware prefetching into account. We implement our technique as a tool to be used with the Halide compiler and test it on a variety of benchmarks. We find an average performance improvement of over 40% compared to previous analytical models targeting the Halide language and compiler.
ACM Digital Library