-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path09_gpu.qmd
72 lines (53 loc) · 2.18 KB
/
09_gpu.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: "GPU programming"
engine: knitr
---
## Installing CUDA.jl
In a context such as Sockeye where it is not possible to access GPU nodes with
internet access, precompilation becomes more complicated than
[the earlier page on non-CUDA precompilation](05_pkg_cache.qmd).
We provide a workaround, `pkg_gpu.nf`, which offers the same functionality
as `pkg.nf` but is slower since all precompilation has to occur on the login
node.
First, add the package as before:
```{julia output = F}
ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0 # Hold off precompile since we are in login node
using Pkg
Pkg.activate("experiment_repo/julia_env")
Pkg.add("CUDA")
```
Next, use the GPU precompilation script:
```{bash}
cd experiment_repo
./nextflow run nf-nest/pkg_gpu.nf
```
## Running nextflow processes requiring GPU
An example of a workflow using GPUs:
```{groovy}
#| eval: false
#| file: experiment_repo/nf-nest/examples/gpu.nf
```
We run it using the same command as usual:
```{bash}
cd experiment_repo
./nextflow run nf-nest/examples/gpu.nf -profile cluster
```
## GPU kernel development
One way to leverage GPUs is to use
[array programming](https://cuda.juliagpu.org/stable/usage/array/) as
demonstrated in the example above. When a problem cannot be cast into
an array problem, an alternative is to construct a custom GPU kernel.
Designing custom GPU kernels is especially attractive in Julia. This is
in big part thanks to
[KernelAbstractions.jl](https://github.com/JuliaGPU/KernelAbstractions.jl),
which allows
the same code to be emit both CPU and GPU versions.
Since error messages are easier to interpret when doing CPU development,
it is useful to be able to test both CPU and GPU targets.
Compared to Julia CPU development, the main constraint when doing GPU
development is that inside the kernel there should not be heap allocations.
Seasoned Julia developers are often already often avoiding to allocate in
the inner loop due to garbage collection costs.
For a concrete example of KernelAbstractions.jl in action,
see [these kernels](https://github.com/alexandrebouchard/sais-gpu/blob/main/kernels.jl)
used to implement [Sequential Annealed Importance Sampling](https://arxiv.org/pdf/2408.12057).