Use of GPUs #474

hakonanes · 2021-11-19T12:55:41Z

We should try to take advantage of GPUs by writing some GPU kernels. I don't have an NVIDIA GPU available, so my choice would be to use PyOpenCL instead of CuPy.

@drowenhorst-nrl have written some kernels in PyEBSDIndex that we could take inspiration from, e.g. the static background subtraction, used in one of the Radon transform functions. Such a kernel could be an alternative to our background subtraction.

In general, I think more per pattern operations in the kikuchipy.pattern module could be replaced by PyOpenCL kernels, like image rescaling. We have CPU acceleration from Numba here, but it would be good to test GPU acceleration as well.

Other resources:

Python package gputools

The text was updated successfully, but these errors were encountered:

drowenhorst-nrl · 2021-11-19T18:56:11Z

Jump in with a quick note: the Achilles heal of GPU compute (excluding new platforms like the Apple M1 with really fast integrated graphics) is the time to transfer to/from the GPU. Thus, it may be hard to beat the CPU when dealing with single, relatively small patterns (1k x1k probably is near the smallest one might consider). In PyEBSDIndex this is mitigated by inherently performing many calculations on a large batch of patterns. However, it may well be worth it - some simple tests would help. Yes, take a look at my kernels. Be aware that I interleave the batches of patterns, thus each pattern looks like it has N channels, and is WxH in 2D size (vs being WxH with N slices in a volume). In my case this significantly reduced my global memory fetches within the GPU.

Also look at the gputools package. https://github.com/maweigert/gputools they might have a lot of what you want. They inspired a lot of my initial efforts.

Final note: the GPU compute landscape is a mess. Yes, OpenCL is currently the most cross-platform framework, but Apple has said that OpenCL is officially depreciated (but not yet removed from latest OS). CUDA is NVIDIA only. Apple and NVIDIA still hate each other. Windows and OpenCL can be done, but not as easy as others ...

I think a lot of commercial software that is cross platform will rewrite for the multiple frameworks/platforms. A lot of the machine learning community says NVIDIA/CUDA or nothing. You might want to take a look at Vulkan/MoltenVK. It is definitely more geared towards rendering rather than compute, but that also might serve your needs well. I have not fully comprehended the interplay between OpenCL and Vulkan, but I think there is something there, and thus why I hold out hope that OpenCL can be a long time solution.

hakonanes · 2021-11-22T18:29:47Z

Thank you for this valuable input, @drowenhorst-nrl.

In PyEBSDIndex this is mitigated by inherently performing many calculations on a large batch of patterns.

This could be easily adopted in kikuchipy I think, since we use Dask to spread the workload on all available CPUs. Dask does this by operating on chunks of the full pattern array. A chunk is typically 100 MB in size, and the signal (detector) axes are never chunked. Thus it would seem like sending chunks on to the GPU would make sense.

Be aware that I interleave the batches of patterns, thus each pattern looks like it has N channels, and is WxH in 2D size (vs being WxH with N slices in a volume). In my case this significantly reduced my global memory fetches within the GPU.

You're describing what you're doing in the following, right?

https://github.com/USNavalResearchLaboratory/PyEBSDIndex/blob/d8f3f4df368a7bc416517b266c518e6c2ad586de/pyebsdindex/radon_fast.py#L221-L237

I think I understand what you're doing, that you're "allocating" a (16 or more patterns, n detector rows, n detector columns) 32-bit floating point array on the GPU. Passed to the CL kernel backSub, you subtract the background one pattern at a time in a loop along the first axis.

Our Dask chunks are usually always 4D, so in our case backSub would have to have two nested loops. Or, we could do the same reshaping as you're doing beforehand.

Also look at the gputools package. https://github.com/maweigert/gputools they might have a lot of what you want. They inspired a lot of my initial efforts.

Looks like a good reference, and perhaps something we could depend on for some functionality.

hakonanes added the enhancement New feature or request label Nov 19, 2021

hakonanes added the help wanted Would be nice if someone could help label Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of GPUs #474

Use of GPUs #474

hakonanes commented Nov 19, 2021 •

edited

Loading

drowenhorst-nrl commented Nov 19, 2021

hakonanes commented Nov 22, 2021 •

edited

Loading

Use of GPUs #474

Use of GPUs #474

Comments

hakonanes commented Nov 19, 2021 • edited Loading

drowenhorst-nrl commented Nov 19, 2021

hakonanes commented Nov 22, 2021 • edited Loading

hakonanes commented Nov 19, 2021 •

edited

Loading

hakonanes commented Nov 22, 2021 •

edited

Loading