-
Notifications
You must be signed in to change notification settings - Fork 19
ReadFromAndWriteToBuffersAndImages
This page describes common properties and principles of buffers and images, but another page is dedicated to image-specific features : [UsingImages].
OpenCL lets you create memory objects of three types : buffers (1D arrays), 2D images and 3D images.
These memory objects can be stored in the host memory (typically, in RAM) or in the device memory (typically, in GRAM directly on the graphic card).
For host memory-stored memory objects, you can either ask OpenCL to allocate memory on its own or directly providing the pointer to some memory you allocated (and initialized) by yourself.
When using a host-allocated memory object, OpenCL might let the kernels access directly to the data from the host or instead choose to cache some of this data, in read and/or write modes. What it means is that if you create a direct buffer of values and create an OpenCL buffer out of it, in "use host pointer" mode, you cannot expect your changes done on your direct buffer to be visible from OpenCL kernels using the OpenCL buffer. Before and after doing any change to or reading any data from a host pointer, you must call the CLBuffer.map / unmap methods.
Lastly, OpenCL might let you read/write directly from an OpenCL memory object with the map/unmap mechanism. Mapping a memory object gives a direct pointer to the memory object's data (for instance using a DMA mechanism). However, mapping a memory object's data is only guaranteed to work in the case of host-allocated pointers, so it should be used with caution.
While OpenCL defines buffers as arrays of bytes, JavaCL provides support for typed buffers with 1 to 1 mapping to NIO buffers.
If you use a __global float*
argument in a kernel of yours, you can either choose to use a CLBuffer<Byte>
and do the float <-> byte
conversion and offsets calculations on your own or directly use a CLBuffer<Float>
that will take care of everything.
JavaCL / BridJ
CLContext context = ... ;
int size = 100;
Pointer<Float> ptr = Pointer.allocateFloats(size).order(context.getByteOrder());
// Last 'false' argument requires that no copy of the pointer is made :
// keep ptr as the primary data source
CLBuffer<Float> clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, false);
JavaCL / JNA
CLContext context = ... ;
int size = 100;
FloatBuffer ptr = ByteBuffer.allocateDirect(size * 4).byteOrder(context.getByteOrder()).asFloatBuffer();
// Last 'false' argument requires that no copy of the pointer is made :
// keep ptr as the primary data source
CLBuffer<FloatBuffer> clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, false);
Allocate a zeroed-out buffer :
int size = 100;
CLBuffer<Float> buffer = context.createFloatBuffer(CLMem.Usage.InputOutput, size);
Allocate a buffer as a copy of existing data :
// Last 'true' argument asks for a copy of ptr to be made in the device memory.
// The clBuffer will have no further link to ptr.
CLBuffer<Float> clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, true);
CLContext context = ... ; // e.g. JavaCL.getBestContext()
CLQueue queue = ... ; // e.g. context.createDefaultQueue()
CLBuffer<Float> clBuffer = ... ; // e.g. context.createFloatBuffer(CLMem.Usage.InputOutput, size)
JavaCL / BridJ
// Reading everything to a new native memory location :
Pointer<Float> data = clBuffer.read(queue);
// Reading elements 3 to 9 included to a new native memory location :
data = clBuffer.read(queue, 3, 7);
// Reading elements 3 to 9 included to existing data memory location :
data = Pointer.allocateFloats(7).order(context.getByteOrder());
clBuffer.read(queue, 3, 7, data, true); // 'true' : blocking operation
JavaCL / JNA
// Reading everything to a new buffer :
FloatBuffer data = clBuffer.read(queue);
// Reading elements 3 to 9 included to a new buffer :
data = clBuffer.read(queue, 3, 7);
// Reading elements 3 to 9 included to existing data buffer :
// (note that while indirect buffers will work, direct buffers will provide much better performance)
data = NIOUtils.directFloats(7, context.getByteOrder());
clBuffer.read(queue, 3, 7, data, true); // 'true' : blocking operation
Mapping a buffer is only guaranteed to work for buffers allocated from host-pointers.
JavaCL / BridJ
CLBuffer<Float> clBuffer = ... ;
try {
Pointer<Float> data = clBuffer.map(queue. CLMem.MapFlags.Write);
data.set(10f); // write something to the buffer
clBuffer.unmap(queue, data);
} catch (CLException.MapFailure ex) {
// map didn't succeed : maybe use CLBuffer.write / read instead.
}
JavaCL / JNA
CLBuffer<Float> clBuffer = ... ;
try {
FloatBuffer data = clBuffer.map(queue. CLMem.MapFlags.Write);
data.put(0, 10f); // write something to the buffer
clBuffer.unmap(queue, data);
} catch (CLException.MapFailure ex) {
// map didn't succeed : maybe use CLBuffer.write / read instead.
}
Each OpenCL device can have a different endianness.
This makes it very hard to share the same buffers between two devices with mismatching endianness (for instance, take a little endian Intel Core device vs. a big endian Radeon 4850 GPU device on the same ATI Stream platform).
CLContext.getByteOrder() can be called to check that the context has a consistent byte ordering (checks that all of its device's CLDevice.getByteOrder() methods return the same order, and returns null in case it's mismatching).
All the CLBuffer read/map methods should return properly ordered NIO buffers (using the order of the queue's device.
JavaCL maps OpenCL entities (allocated by the OpenCL driver, typically in the device memory) to Java objects (managed by the JVM's garbage collector).
OpenCL entities are released when their Java objects counterparts are garbage collected.
In many cases this can lead to serious issues : when the OpenCL driver runs out of memory, it does not tell Java to try and collect unused objects (which would release a few OpenCL entities in the process) and just fails, which makes JavaCL throw a CLException.MemObjectAllocationFailure or OutOfResources exception.
To avoid that, one can manually release an unused buffer (or any JavaCL entity) by calling CLEntity.release() (CLEntity is a base class which is inherited by CLBuffer, CLImage2D, CLProgram, CLEvent... virtually all JavaCL classes of interest).
JavaCL features a workaround for allocations that fail due to lack of OpenCL memory (it then calls the Java Garbage Collector and waits a while before retrying), but it is advised to call release() everytime possible (but only once on each object, of course).