Skip to content

Latest commit

 

History

History
139 lines (112 loc) · 6.06 KB

gpu_kernels.md

File metadata and controls

139 lines (112 loc) · 6.06 KB

GPU kernels implementation overview

As mentioned in GPU plugin structure, kernels for GPU plugin are located in src/plugins/intel_gpu/src/kernel_selector folder.

For each operation we usually have multiple kernels that can support different parameters and/or optimized for different scenarios.

Each operation has 3 major entities in kernel selector:

  • Operation specific kernel_selector instance
  • Operation parameters descriptor
  • Kernels itself with a set of heuristics inside for optimal selection

Kernel selector instance

For each operation we create kernel_selector class derived from kernel_selector_base. Basically, this class is needed to specify available kernels for given operation. Each kernel selector is used as singleton. For example:

class mvn_kernel_selector : public kernel_selector_base {
public:
    static mvn_kernel_selector& Instance() {
        static mvn_kernel_selector instance_;
        return instance_;
    }

    mvn_kernel_selector();

    KernelsData GetBestKernels(const Params& params, const optional_params& options) const override;
}

// The list of available kernels is usually specified in kernel_selector c-tor using `Attach` method whith creates instance of each type
// and append it to implementations list.
// In this case we have 3 available kernels for MVN operation. Kernels might have different priorities and support only subset of operation parameters
// E.g. MVNKernel_b_fs_yx_fsv16_imad supports only `fsv16` blocked layouts and INT8/UINT8 input data types
mvn_kernel_selector::mvn_kernel_selector() {
    Attach<MVNKernelRef>();
    Attach<MVNKernelBfyxOpt>();
    Attach<MVNKernel_b_fs_yx_fsv16_imad>();
}

// This method is used to get the optimal kernel for given parameters
// There are 2 base methods to pick optimal kernels: `GetNaiveBestKernel` and `GetAutoTuneBestKernel`
// If kernel supports auto tuning, then it uses `GetAutoTuneBestKernel`, otherwise, it uses `GetNaiveBestKernel`
// parameterized with `KernelType` which specifies the operation type which is implemented by the specific kernel selector
KernelsData mvn_kernel_selector::GetBestKernels(const Params& params, const optional_params& options) const {
    return GetNaiveBestKernel(params, options, KernelType::MVN);
}

The caller code looks as follows:

// Get static instance of the kernel_selector
auto& kernel_selector = kernel_selector::mvn_kernel_selector::Instance();
// Run some heuristics to pick the best mvn kernel for given `mvn_params`
auto best_kernels = kernel_selector.GetBestKernels(mvn_params, mvn_optional_params);

Operation parameters

The parameters of operation for kernel_selector are defined in corresponding ${op_name}_params class which is derived from base_params. For example:

struct mvn_params : public base_params {
    mvn_params() : base_params(KernelType::MVN) {}

    MVNMode mvnMode = MVNMode::WITHIN_CHANNELS;
    bool mvnNormalizeVariance = true;
    float epsilon = 1e-10f;

    virtual ParamsKey GetParamsKey() const {
        ParamsKey k = base_params::GetParamsKey();

        k.EnableMVNMode(mvnMode);

        if (mvnNormalizeVariance)
            k.EnableMVNNormalizeVariance();

        return k;
    }
};

The derived class should parameterize base class with specific KernelType and add operation-specific parameters. The only method that must be implemented is GetParamsKey() which is used as a quick check for kernels applicability for current parameters, i.e. we take ParamsKey object calculated for input operation parameters and ParamsKey object for each kernel, so we can compare them and discard the kernels that don't support current parameters. ParamsKey is implemented as a set of bit masks, so the applicability check is quite simple:

const ParamsKey implKey = some_implementation->GetSupportedKey();
if (!implKey.Support(paramsKey))
    // Do something

// Support() method do something like follows for each internal bit mask:
if (!((implKey.mask & paramsKey.mask) == paramsKey.mask))
    return false;

Kernel implementation

Each kernel must specify the following things:

  • Input parameters checks
    • GetSupportedKey() method implementation which returns ParamsKey object for current implementation
    • Validate() method that do more complex checks (optional)
  • Dispatch data (global/local workgroup sizes, scheduling algorithm, etc)
  • Kernel name - must be passes to base class c-tor
  • Kernel arguments specification - description of each argument in corresponding OpenCL™ kernel
  • Additional JIT constants required for kernel - set of macro definitions that must be added to thi kernel template to make full specialization for given params
  • Supported fused operations (if any) - a list of supported operations that can be fused into current kernel

Let's have a look at the key methods of each kernel implementation:

class MVNKernelRef : public MVNKernelBase {
public:
    MVNKernelRef() : MVNKernelBase("mvn_gpu_ref") {} // mvn_gpu_ref is the name of the file with kernel template in cl_kernels/ folder without .cl extension
    // Returns the kernel specified for input parameters if the implementation can process it
    KernelsData GetKernelsData(const Params& params, const optional_params& options) const override;
    // Returns `ParamsKey` for current implementation for quick applicability check
    ParamsKey GetSupportedKey() const override;

protected:
    // Specifies additional jit constants for kernel template specification
    JitConstants GetJitConstants(const mvn_params& params, DispatchData dispatchData) const override;
    // The list of supported fused operations
    std::vector<FusedOpType> GetSupportedFusedOps() const override {
        return {
            FusedOpType::ACTIVATION,
            FusedOpType::QUANTIZE,
            FusedOpType::ELTWISE,
            FusedOpType::SCALE
        };
    }
};

See also