Skip to content
Olivier Chafik edited this page Nov 10, 2022 · 5 revisions

Description of JavaCL's bundled goodies

Fourier Transforms (DFT, power-of-two FFT)

JavaCL ships with ready-to-use FFT utility classes (Fourier transform and inverse transform) :

It is pretty similar to Apache Common's FastFourierTransformer class, except that :

  • It's significantly faster (even when run on a CPU OpenCL device)
  • It has floats and doubles variants (Apache's implementation only deals with doubles)
  • It uses primitive arrays instead of arrays of Complex for complex data (it interleaves real and imaginary values in an array that's twice as big)

JavaCL even features a slower general Fourier Transform that works with input sizes that aren't powers of two :

The following code computes an FFT and the reverse FFT in a few lines :

int n = 1024; // power-of-two
float[] complexInput = new float[2 * n]; // interleaved real and imag values
for (int i = 0; i < n; i++) {
    complexInput[i * 2] = i; // pure-real input
    complexInput[i * 2 + 1] = 0;
CLContext context = JavaCL.createBestContext(); // use createBestContext(DoubleSupport) for doubles
CLQueue queue = context.createDefaultQueue();

FloatFFTPow2 fft = new FloatFFTPow2(context); // use default OpenCL context
float[] transformed = fft.transform(queue, complexInput, false);
float[] transformedBack = fft.transform(queue, transformed, true);

These FFT classes can also operate directly on user-provided input and output CLBuffer instances, to avoid back-and-forth conversions from Java primitive arrays to OpenCL memory (useful if you're trying to do the FFT of data that was just transformed by OpenCL).

Parallel Random Numbers Generator

JavaCL ships with a Xorshift random numbers generator, adapted to work in parallel (using a warmup phase).

[ParallelRandom] was designed as a drop-in replacement for java.util.Random (albeit with a more limited API) :

ParallelRandom r = new ParallelRandom();
while (true) {

Parallel Reduction (min, max, sum, product)

It's very easy to perform many parallel operations in OpenCL, but exploiting its parallel horse-power to aggregate the results is a bit harder.

JavaCL ships with the ReductionUtils class, which aggregates OpenCL data using simple associative and symmetrical operations :

  • min
  • max
  • sum
  • product

If you have a vector of float2 and need to compute the min of all its values, separating the x and y components, then you can even ask for a 2-channels reductor ! (reduction takes place independently on each channel)

Here's how :

CLContext context = JavaCL.createBestContext();
CLQueue queue = context.createDefaultQueue();
// Create a one-channel int "min" reductor :
Reductor<Integer> reductor = ReductionUtils.createReductor(context, ReductionUtils.Operation.Min, OpenCLType.Int, 1);

CLBuffer<Integer> input = context.createIntBuffer(Usage.Input, 1024);
// < fill input with some data here... >

int value = reductor.reduce(queue, input).get();

UJMP Matrices

The Java Matrix Package (UJMP) library is an amazing project that provides a comprehensive API that covers most if not all linear algebra needs, with default implementations as well as wrapper implementations for all existing prominent Java matrix packages on Earth (give a look at its features list, it's just mind-bloggling !).

Maven JavaCL Blas dependency

JavaCL's "javacl-blas" Maven package contains a lazy implementation of float and double 2D dense [Matrix] for the UJMP library.

      <name>nativelibs4java Maven2 Repository</name>


Transparently asynchronous operations

[CLDenseDoubleMatrix2D] and CLDenseFloatMatrix2D feature transparent asynchronicity, powered by the OpenCL event model under the hood. The statement :

Matrix abta = a.times(b).times(a.transpose());

Returns a matrix, abta, that's probably not finished yet. Any attempt to read a specific component of that matrix will wait for all its write computations to finish.

Accelerated matrix operations

  • times (matrix multiplication)
  • transpose (matrix transpose)
  • plus, minus, divide, mtimes (piece-wise binary operators)
  • min, max, center, copy
  • sin, cos, tan, sinh, cosh, tanh, asin, acos, atan, asinh, acosh, atanh
  • containsDouble
  • toDoubleArray
  • clear


To use OpenCL for all double and float 2D dense matrices :


You will then be able to create OpenCL-powered matrices with :

Matrix m = MatrixFactory.dense(10, 10);

Or you can instantiate your matrices with the direct type :

    a = new CLDenseDoubleMatrix2D(10),
    b = new CLDenseDoubleMatrix2D(10);

Matrix c = a.times(b);