Convolution Using StaticKernels.jl
#9
Replies: 2 comments
-
This is an interesting usecase. Unfortunately I don't see an easy way to achieve this using only the public interface of To properly support this and similar usecases, either the interface would need to change or we could leverage the output array axes to infer the loop domain (then you could create an |
Beta Was this translation helpful? Give feedback.
-
OK, I tried the following approach just a PoC: using BenchmarkTools;
using LoopVectorization;
using StaticKernels;
using PaddedViews;
numSamplesA = 1000;
numSamplesB = 3;
vA = rand(numSamplesA);
vB = rand(numSamplesB);
vO = rand(numSamplesA + numSamplesB - 1);
numElementsPad = length(vA) + (2 * (length(vB) - 1));
vParnetIdx = (length(vB) - 1) .+ (1:length(vA));
vAA = PaddedView(0, vA, (1:numElementsPad,), (vParnetIdx,)); #<! View
vK = Kernel{(0:(length(vB) - 1), )}(@inline vW -> vW[0] * vB[3] + vW[1] * vB[2] + vW[2] * vB[1]);
map!(vK, vO, vAA); It gives the correct answer. So I tried: vOO = zeros(numSamplesA - numSamplesB + 1);
map!(vK, vO, vAA); Yet I get similar run time (I do it with For comparison: using BenchmarkTools;
using LoopVectorization;
using StaticKernels;
using PaddedViews;
function ___Conv1D!( vO :: Vector{T}, vA :: Vector{T}, vB :: Vector{T} ) where {T <: Real}
# Based on https://discourse.julialang.org/t/97658/15
# Doesn't require `StrideArraysCore` (Uses 1 based index vs. 0 based index)
J = length(vA);
K = length(vB); #<! Assumed to be the Kernel
# Optimized for the case the kernel is in vB (Shorter)
J < K && return ___Conv1D!(vO, vB, vA);
I = J + K - 1; #<! Output length
@turbo for ii in 0:(K - 1) #<! Head
sumVal = zero(T);
for kk in 1:K
ib0 = (ii >= kk);
oa = ib0 ? vA[ii - kk + 1] : zero(T);
sumVal += vB[kk] * oa;
end
vO[ii] = sumVal;
end
@turbo inline=true for ii in K:(J - 1) #<! Middle
sumVal = zero(T);
for kk in 1:K
sumVal += vB[kk] * vA[ii - kk + 1];
end
vO[ii] = sumVal;
end
@turbo for ii in J:I #<! Tail
sumVal = zero(T);
for kk in 1:K
ib0 = (ii < J + kk);
oa = ib0 ? vA[ii - kk + 1] : zero(T);
sumVal += vB[kk] * oa;
end
vO[ii] = sumVal;
end
return vO
end
vA = [1, 2, 3, 4, 5];
vB = [4, 5, 6];
vC = [6, 5, 4];
vO = collect(1:(length(vA) + length(vB) - 1));
numSamplesA = 1000;
numSamplesB = 3;
vA = rand(numSamplesA);
vB = rand(numSamplesB);
vO = rand(numSamplesA + numSamplesB - 1);
numElementsPad = length(vA) + (2 * (length(vB) - 1));
vParnetIdx = (length(vB) - 1) .+ (1:length(vA));
vAA = PaddedView(0, vA, (1:numElementsPad,), (vParnetIdx,)); #<! View
vK = Kernel{(0:(length(vB) - 1), )}(@inline vW -> vW[0] * vB[3] + vW[1] * vB[2] + vW[2] * vB[1]);
@benchmark ___Conv1D!($vO, $vA, $vB)
@benchmark map!($vK, $vO, $vAA) On my system the difference is about x180. I just thought it might be interesting for you. |
Beta Was this translation helpful? Give feedback.
-
Given 2 vectors:
I would like to perform their convolution using
StaticKernels.jl
.I would be something like:
The problem is I can get only 5 (the length of the data
vA
) elements as output and notlength(vA) + length(vB) - 1
.I was wondering if there a trick to do with the boundary extension to get the output of the full convolution?
In the case above I can define
vAA = [0, 0, 1, 2, 3, 4, 5]
and then run on it to get the required result, yet I wonder if it can be done without explicitly extend the vector.Beta Was this translation helpful? Give feedback.
All reactions