OpenCL-Repo

Various code examples in OpenCL

Teachers:

Troels Henriksen athas@sigkill.dk
Cosmin Oancea cosmin.oancea@diku.dk

Tentative schedule

DAY 1 (slides)

I. Lecture 9:00 - 11:00:

Hardware Trends motivating : power wall, memory/bandwidth wall
GPU vs CPU architecture. GPUs (AMD) centered around three key ideas: a) transistors used for massive parallelism rather than for programming convenience (caches + control units). b) SIMD: amortize cost/complexity of managing an instruction stream across many ALUs; c) Hardware multi threading for hiding memory latency. d) capricious hardware to program, e.g., divergence + coalescing

II. Guided programming exercises (11:00 - 12:00; 13:00 - 5:00pm):

Introduction to OpenCL programming model

buffers + CPU-GPU transfer + control queues + enqueueing kernel + events, etc.):

Fill in the blank exercises:

simple hello world
example demonstrating thread divergence (load balancing) issues
2-d stencil exercising texture memory + 2-d kernel
demonstrating profiling + debugging + printing
maybe naive matrix-matrix multiplication

DAY 2 (slides)

I. Lecture 9:00 - 12:00:

Simple dependency analysis on arrays; when is a loop parallel and when is it safe to interchange or distribute a loop?
Optimizing temporal locality by tiling demonstrated on matrix-matrix multiplication (MMM):
- Block tiling as loop strip-mining + loop interchange
- starting from the naive MMM we derive a block-tiled version and
a block+register tiled version (in C pseudocode)
- GPU: local memory + barrier synchronization
Optimizing spatial locality:
- what are coalesced accesses to global memory
- transforming coalesced to uncoalesced accesses by transposition
- how to implement a transposition kernel in which all read/write accesses are

II. Guided programming exercises (13:00 - 5:00pm): Walking over the provided code that aims to demonstrate the topics covered in lecture.

Fill In the blanks exercise: -- the register + block tiled version of matrix-matrix multiplication -- optimize a (contrived) program to have only coalesced accesses to global memory by means of transposition. -- then optimize it further by fusing the transposition inside the program.

DAY 3 (slides)

I. Lecture 9:00 - 12:00:

Data parallel building blocks: map, reduce, (segmented) scan semantics and GPU implementation ideas. Data-parallel thinking: compose programs like puzzles from a nested composition of such bulk operators. Main optimization: fusion. Applications: maximal-segment sum problem (MSSP), two-way partitioning, sparse-matrix-vector multiplication.

II. Guided programming exercises (11:00 - 12:00; 13:00 - 5:00pm):

Take a look at the implementation of reduce/scan.

Fill in the blanks exercise: * map-reduce fusion for MSSP. * optimizing two-way partitioning an array (scan-scan horizontal fusion; map-scan fusion). * sparse-matrix vector multiplication

DAY 4

I. Lecture 9:00 - 12:00:

Histogram formulation and "reduce-by-key" generalization; optimization space and possible implementation strategies

Stencil fusion (\cite{Halide})

Streaming: overlapping communication + computation.

II. Guided programming exercises (13:00 - 5:00pm):

"atomics" support in OpenCL (atomic-add, CAS, compare-and-exchange)
walking over the provided code that aims to demonstrate the topics covered in lecture.
fill in the blank exercises regarding histogram implementation
fill in the blank exercise regarding stencil fusion.
demonstrating overlapping communication + computation

DAY 5 --

(4 hours)

Practical considerations: here we discuss computational kernels of interest to BkMedical (proposed by Franck) By discuss, I mean fill in the blank exercises through which we guide the audience to develop and efficient solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OpenCL-Repo

Teachers:

Tentative schedule

DAY 1 (slides)

DAY 2 (slides)

DAY 3 (slides)

DAY 4

DAY 5 --

Files

README.md

Latest commit

History

README.md

File metadata and controls

OpenCL-Repo

Teachers:

Tentative schedule

DAY 1 (slides)

DAY 2 (slides)

DAY 3 (slides)

DAY 4

DAY 5 --