As mentioned in GPU plugin structure, kernels for GPU plugin are located in src/plugins/intel_gpu/src/kernel_selector
folder.
For each operation we usually have multiple kernels that can support different parameters and/or optimized for different scenarios.
Each operation has 3 major entities in kernel selector:
- Operation specific
kernel_selector
instance - Operation parameters descriptor
- Kernels itself with a set of heuristics inside for optimal selection
For each operation we create kernel_selector class derived from kernel_selector_base
. Basically, this class is needed to specify available kernels
for given operation. Each kernel selector is used as singleton. For example:
class mvn_kernel_selector : public kernel_selector_base {
public:
static mvn_kernel_selector& Instance() {
static mvn_kernel_selector instance_;
return instance_;
}
mvn_kernel_selector();
KernelsData GetBestKernels(const Params& params, const optional_params& options) const override;
}
// The list of available kernels is usually specified in kernel_selector c-tor using `Attach` method whith creates instance of each type
// and append it to implementations list.
// In this case we have 3 available kernels for MVN operation. Kernels might have different priorities and support only subset of operation parameters
// E.g. MVNKernel_b_fs_yx_fsv16_imad supports only `fsv16` blocked layouts and INT8/UINT8 input data types
mvn_kernel_selector::mvn_kernel_selector() {
Attach<MVNKernelRef>();
Attach<MVNKernelBfyxOpt>();
Attach<MVNKernel_b_fs_yx_fsv16_imad>();
}
// This method is used to get the optimal kernel for given parameters
// There are 2 base methods to pick optimal kernels: `GetNaiveBestKernel` and `GetAutoTuneBestKernel`
// If kernel supports auto tuning, then it uses `GetAutoTuneBestKernel`, otherwise, it uses `GetNaiveBestKernel`
// parameterized with `KernelType` which specifies the operation type which is implemented by the specific kernel selector
KernelsData mvn_kernel_selector::GetBestKernels(const Params& params, const optional_params& options) const {
return GetNaiveBestKernel(params, options, KernelType::MVN);
}
The caller code looks as follows:
// Get static instance of the kernel_selector
auto& kernel_selector = kernel_selector::mvn_kernel_selector::Instance();
// Run some heuristics to pick the best mvn kernel for given `mvn_params`
auto best_kernels = kernel_selector.GetBestKernels(mvn_params, mvn_optional_params);
The parameters of operation for kernel_selector are defined in corresponding ${op_name}_params
class which is derived from base_params
. For example:
struct mvn_params : public base_params {
mvn_params() : base_params(KernelType::MVN) {}
MVNMode mvnMode = MVNMode::WITHIN_CHANNELS;
bool mvnNormalizeVariance = true;
float epsilon = 1e-10f;
virtual ParamsKey GetParamsKey() const {
ParamsKey k = base_params::GetParamsKey();
k.EnableMVNMode(mvnMode);
if (mvnNormalizeVariance)
k.EnableMVNNormalizeVariance();
return k;
}
};
The derived class should parameterize base class with specific KernelType
and add operation-specific parameters. The only method that must be implemented
is GetParamsKey()
which is used as a quick check for kernels applicability for current parameters, i.e. we take ParamsKey
object calculated for input
operation parameters and ParamsKey
object for each kernel, so we can compare them and discard the kernels that don't support current parameters.
ParamsKey
is implemented as a set of bit masks, so the applicability check is quite simple:
const ParamsKey implKey = some_implementation->GetSupportedKey();
if (!implKey.Support(paramsKey))
// Do something
// Support() method do something like follows for each internal bit mask:
if (!((implKey.mask & paramsKey.mask) == paramsKey.mask))
return false;
Each kernel must specify the following things:
- Input parameters checks
GetSupportedKey()
method implementation which returnsParamsKey
object for current implementationValidate()
method that do more complex checks (optional)
- Dispatch data (global/local workgroup sizes, scheduling algorithm, etc)
- Kernel name - must be passes to base class c-tor
- Kernel arguments specification - description of each argument in corresponding OpenCL™ kernel
- Additional JIT constants required for kernel - set of macro definitions that must be added to thi kernel template to make full specialization for given params
- Supported fused operations (if any) - a list of supported operations that can be fused into current kernel
Let's have a look at the key methods of each kernel implementation:
class MVNKernelRef : public MVNKernelBase {
public:
MVNKernelRef() : MVNKernelBase("mvn_gpu_ref") {} // mvn_gpu_ref is the name of the file with kernel template in cl_kernels/ folder without .cl extension
// Returns the kernel specified for input parameters if the implementation can process it
KernelsData GetKernelsData(const Params& params, const optional_params& options) const override;
// Returns `ParamsKey` for current implementation for quick applicability check
ParamsKey GetSupportedKey() const override;
protected:
// Specifies additional jit constants for kernel template specification
JitConstants GetJitConstants(const mvn_params& params, DispatchData dispatchData) const override;
// The list of supported fused operations
std::vector<FusedOpType> GetSupportedFusedOps() const override {
return {
FusedOpType::ACTIVATION,
FusedOpType::QUANTIZE,
FusedOpType::ELTWISE,
FusedOpType::SCALE
};
}
};