Documentation, Examples #1

RoyiAvital · 2020-04-16T05:33:53Z

Hello,

This looks like a great addition to Julia for those who are working on Image Processing.

I think it is missing documentation.
But for now, could you show examples with all supported boundary conditions?

Usually we have:

Constant (With default of zero).
Replicate.
Mirror.
Circular.

Thank You.

stev47 · 2020-04-16T09:49:37Z

You are right, the API is actually documented fairly well, but end-user examples and a generated documentation website is still missing. This will come eventually.

Regarding boundary conditions: accesses outside the array evaluate to nothing, so you can check for that and specify the behaviour yourself.
Here some simple 1d-examples:

using StaticKernels
a = collect(1:9)

constant

k = Kernel{(1:1,)}(w -> something(w[1], 5))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 5]

replicate

k = Kernel{(-1:1,)}(w -> something(w[1], w[0]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 9]

mirror

k = Kernel{(-1:1,)}(w -> something(w[1], w[-1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 8]

circular

currently no nice solution, but you can hack it by referencing the data array within your window function like this:

k = Kernel{(-1:1,)}(w -> something(w[1], a[1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 1]

or more generally by using w.position to get the current window position.
In the future I'd like to have an interface for common boundary conditions like this:

k = Kernel{(1:1,)}(boundary=5, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 5]
k = Kernel{(-1:1,)}(boundary=:replicate, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 9]
k = Kernel{(-1:1,)}(boundary=:mirror, w -> w[1]))
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 8]
k = Kernel{(-1:1,)}(boundary=:circular, w -> w[1])
map(k, a) # [2, 3, 4, 5, 6, 7, 8, 9, 1]

stev47 · 2020-04-16T11:11:57Z

I don't think the readme is the place for that. You can look at the documentation of Kernel from within Julia, where the parameters are explained and the function something is defined in Julia base.
A more detailed documentation page will follow.

stev47 · 2020-04-16T12:48:29Z

Thank you for the suggestions.
I referred to the documentation in src/types.jl:

    Kernel{X}(f)
Create a kernel with axes `X` wrapping a kernel function `f`.
The window function defines a reduction of values within the `X`-sized window.
For best performance you should annotate `f` with `@inline` and index accesses
within using `@inbounds`.

But you are totally right: documentation for end-users is not great at the moment, I will try to fix that.

RoyiAvital · 2023-01-09T18:32:50Z

Is there way to have valid mode?
Namely it will work only on valid indices where the kernel is well defined?

stev47 · 2023-01-09T19:08:52Z

I'm not sure I understand what you are searching for. It will already throw an error if the kernel makes out-of-bounds accesses on arrays without an extension defined. This is since boundary conditions are implemented by calling the window function with properly cropped windows and out-of-bounds access gets turned into calls to getindex_extension.

If you instead search for some kind of static analysis of the kernel function at compile-time such that no out-of-bounds access will be performed, then I think that is out-of-scope for this package: window functions are treated as more or less black box functions. The @kernel macro will derive correct kernel bounds automatically as a macro, but only for static/constant indexing.

RoyiAvital · 2023-01-09T19:26:02Z

Think of a simple 3x3 kernel (k) over an image of 100x100 (mI).
The valid part of the kernel is the inner 98x98.

Let's say I want to apply this kernel over and over on the same image, but I only care about the inner part. I'd do something like:

map!(k, mI, mI);

Yet the problem is the return size is 98x98, so it won't work.
One we would be creating a view of the inner part of mI and then apply this.

I was wondering if there is something built in.

Also, any chance for a guide, for beginners, about creating a user defined extension / kernel? Specifically about handling the boundaries.

stev47 · 2023-01-09T20:12:22Z

Just use an extension: map(k, extend(a, StaticKernels.ExtensionReplicate())) will return the same dimensions as a. As long as you are only interested in the inner part, any extension will do the job, performance should not be a big issue.

If for whatever reason you absolutely want to apply the kernel only on the relevant parts (which will be getting smaller each iteration) you need to do that yourself, but the provided axes(k, a) method for a kernel k can help you to calculate the bounds.

For extensions there is not much documentation available but it is usually enough to implement StaticKernels.getindex_extension(a, i::Int, ext::MyExtension) where a is the array, i will be an out-of-bounds index and ext is your extension type as subtype of StaticKernels.Extension. If you are not afraid of reading code, have a look at src/extension.jl how the predefined ones are implemented.

RoyiAvital · 2023-01-10T07:56:47Z

I will try to give a concrete example, from the discussion at https://discourse.julialang.org/t/92691:

using BenchmarkTools;
using LoopVectorization;
using StaticKernels;


function SolveLaplace!( mT, numItr )
    numRows, numCols = size(mT);

    for kk in 1:numItr
        for jj in 2:(numCols - 1)
            for ii in 2:(numRows - 1)
                mT[ii, jj] = 0.25 * (mT[ii - 1, jj] + mT[ii + 1, jj] + mT[ii, jj - 1] + mT[ii, jj + 1]);
            end
        end
    end
    return mT;
end

function SolveLaplaceTurbo!( mT, numItr )
    numRows, numCols = size(mT);

    for kk in 1:numItr
        @turbo for jj in 2:(numCols - 1), ii in 2:(numRows - 1)
            mT[ii, jj] = 0.25 * (mT[ii - 1, jj] + mT[ii + 1, jj] + mT[ii, jj - 1] + mT[ii, jj + 1]);
        end
    end
    return mT;
end

function SolveLaplaceKernel!( mT, numItr )
    numRows, numCols = size(mT);

    k = @kernel w -> 0.25 * (w[-1, 0] + w[1, 0] + w[0, -1] + w[0, 1])
    mV = @view mT[2:(numRows - 1), 2:(numCols - 1)]

    for kk in 1:numItr
        # map!(k, mT, extend(mT, StaticKernels.ExtensionReplicate()));
        map!(k, mV, mT);
    end
    return mT;
end

n = 100;
T = zeros(n, n);
T[1, 1:n] .= 500;
T[n, 1:n] .= 500;
T[1:n, 1] .= 200;
T[1:n, n] .= 500;

numItr = 1000;


@btime SolveLaplace!($T, $numItr);
@btime SolveLaplaceTurbo!($T, $numItr);
@btime SolveLaplaceKernel!($T, $numItr);

I wanted to show that StaticKernels.jl is perfect for that.
Yet it seems the static kernel still doesn't give enough info for the vectorizer as LoopVectorization.jl.
You may see what I meant by working on the valid shape but returning the whole.

Is there any simple optimization to StaticKernels.jl to get the performance of LoopVectorization.jl in the case above?

stev47 · 2023-01-10T22:55:52Z

Your output and input array for the kernel operation is the same, which means that you almost certainly compute the wrong thing here since you overwrite relevant input data.
This (probably unintended!) data dependency means that vectorization of the code is hard for llvm. Note that LoopVectorization assumes that loop iterations are independent and usage will therefore change your results in this case!

Nevertheless you could use an intermediate buffer like so to improve:

function SolveLaplaceKernel2!( mT, numItr )
           numRows, numCols = size(mT);
           mBuf = similar(mT);

           k = @kernel w -> 0.25 * (w[-1, 0] + w[1, 0] + w[0, -1] + w[0, 1])
           mTExt = extend(mT, StaticKernels.ExtensionReplicate())
           mBufExt = extend(mBuf, StaticKernels.ExtensionReplicate())

           for kk in 1:numItr÷2
               map!(k, mBuf, mTExt);
               map!(k, mT, mBufExt);
           end
           if isodd(numItr)
               map!(k, mBuf, mTExt)
               copy!(mT, mBuf)
           end
           return mT;
       end

For me this has an factor of ~8 improvement over your version, but please benchmark for yourself.
You should be able to achieve similar times using LoopVectorization if you rewrite it as I did.

PS: I don't think this issue tracker is the right place for these discussions.

RoyiAvital · 2023-01-11T07:34:12Z

In this context it is OK to work in place (Gauss Seidel vs. Jacobi iterations).
I see, so the point is that in the case of StaticKernels.jl the compiler sees the dependency while in the case of LoopVectorization.jl it is forced to ignore it.

Regarding the discussion, any chance you enable "Discussions" for this repository?

stev47 · 2023-01-11T09:28:38Z

Ok, in that case you can ignore my example since I assumed it was unintended. If you have this data dependency, then there are there are almost no options for vectorization since computation for each component depends on the result of the previous computation, which is why you have similar performance as the naive loop. When you use @turbo of LoopVectorization you explicitly say: "I don't care for this dependency" which may change your results. I would recommend using the traditional loop to make the iteration order explicit, since it is relevant in your case (iteration order for map over kernels may change in the future).

I'll try enabling the discussion feature.

RoyiAvital · 2024-09-17T14:21:55Z

Hi,

In documentation:

# 2d total variation
k = @kernel w -> abs(w[1,0] - w[0,0]) + abs(w[0,1] - w[0,0])
sum(k, extend(a, StaticKernels.ExtensionReplicate()))

What's the difference between using sum() and map() in the context of StaticKernels.jl?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation, Examples #1

Documentation, Examples #1

RoyiAvital commented Apr 16, 2020 •

edited

Loading

stev47 commented Apr 16, 2020 •

edited

Loading

stev47 commented Apr 16, 2020

stev47 commented Apr 16, 2020

RoyiAvital commented Jan 9, 2023

stev47 commented Jan 9, 2023

RoyiAvital commented Jan 9, 2023

stev47 commented Jan 9, 2023

RoyiAvital commented Jan 10, 2023

stev47 commented Jan 10, 2023

RoyiAvital commented Jan 11, 2023

stev47 commented Jan 11, 2023

RoyiAvital commented Sep 17, 2024

Documentation, Examples #1

Documentation, Examples #1

Comments

RoyiAvital commented Apr 16, 2020 • edited Loading

stev47 commented Apr 16, 2020 • edited Loading

constant

replicate

mirror

circular

stev47 commented Apr 16, 2020

stev47 commented Apr 16, 2020

RoyiAvital commented Jan 9, 2023

stev47 commented Jan 9, 2023

RoyiAvital commented Jan 9, 2023

stev47 commented Jan 9, 2023

RoyiAvital commented Jan 10, 2023

stev47 commented Jan 10, 2023

RoyiAvital commented Jan 11, 2023

stev47 commented Jan 11, 2023

RoyiAvital commented Sep 17, 2024

RoyiAvital commented Apr 16, 2020 •

edited

Loading

stev47 commented Apr 16, 2020 •

edited

Loading