ARROW-12724: [C++] Add documentation for authoring compute kernels #10296

edponce · 2021-05-11T21:40:41Z

This PR extends to the compute layer documentation by describing a developer's process for authoring new compute functions. It describes the commonly used files, data structures, and functions for understanding compute functions. Also, it provides a tutorial with examples.

github-actions · 2021-05-11T21:41:00Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2021-05-11T22:19:38Z

https://issues.apache.org/jira/browse/ARROW-12724

pitrou

This document will need to be referenced from a higher-level document in order to be available in the doc navigation.

pitrou · 2021-05-12T08:13:59Z

docs/source/cpp/authoring_compute_functions.rst

+
+An introduction to compute functions is provided in https://arrow.apache.org/docs/cpp/compute.html.
+
+The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.


Can you wrap long lines?

Also, it seems you're using Markdown syntax. These documents should be authored using restructuredText.

bkietz

IMHO this doesn't have a clear flow. It front-loads definitions and lists of attributes then proceeds to a set of code snippets. This might be useful as a reference document, but I think it'd be more readable to structure this as a how-to guide/tutorial.

I think the tutorial should start with a statement of intent; and explanation in simple terms of what's missing and what will be accomplished by the end of the document. Technical details can then be introduced when (iff) they are necessary to current narrative step.

For example, you could start with browsing the documentation for useful compute functions and displaying how they are used from python. After noting that absolute_value is absent, you could show how to (aspirationally) add it to compute.rst to make it clear where we'll be at the end of the tutorial.

The discussion of different function kinds then focuses naturally on absolute_value's kind: SCALAR (the most common kind) with the specific examples of absolute_value and existing functions referenced in the first paragraph to illustrate what this implies about the function's execution.

In summary, I think exhaustive coverage of the complexity of the compute module is less useful than a targeted tour of a common modification.

docs/source/cpp/authoring_compute_functions.rst

bkietz · 2021-05-17T14:40:04Z

docs/source/cpp/authoring_compute_functions.rst

+
+A function that computes scalar summary statistics from array input.
+
+### Hash aggregate


Suggested change

### Hash aggregate

Hash aggregate

~~~~~~~~~~~~~~

In addition, please ensure that all your links are in the RST format. For example, to create a link to the doxygen doc for a specific class member, use:

:member:`ScalarKernel::exec`

To create a link to a specific source file on the master branch, use:

`The scalar API header <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/api_scalar.h>`__

bkietz · 2021-05-19T17:49:49Z

docs/source/cpp/authoring_compute_functions.rst

+Recall that "Arithmetic" functions create two kernel variants: default and overflow-checking. Therefore, we use the `SCALAR_ARITHMETIC_UNARY` macro which requires two function names (with and without "_checked" suffix).
+
+```C++
+SCALAR_ARITHMETIC_UNARY(AbsoluteValue, "absolute_value", "absolute_value_checked")


In general, please avoid use of macros like SCALAR_ARITHMETIC_UNARY() and other context specific helpers for this walk through, as they obscure the underlying C++ and require a user to look up the macro's definition to understand what they're writing.

This was helpful for me. Perhaps it would be better to explain when such macros should be used and explain this particular macro so that these are used consistently within Arrow.

bkietz · 2021-05-19T17:53:01Z

docs/source/cpp/authoring_compute_functions.rst

+Define compute function
+-----------------------


Try to avoid ambiguity between the instance of compute::Function which is being added to the registry and the convenience function which is being added to api_scalar.h. The former is the canonical definition. The latter is a wrapper for easy use from C++, and probably isn't necessary for most compute functions. In fact for simplicity it might be better to avoid modifying api_scalar.h in this tutorial

It may be helpful to say something about impacts on language bindings, assuming that most of these will use what is in the registry rather tan the C++ convenience function.

bkietz · 2021-05-19T19:08:22Z

docs/source/cpp/authoring_compute_functions.rst

+Arrow uses Google test framework. All kernels should have tests to ensure stability of the compute layer. Tests should at least cover ordinary inputs, corner cases, extreme values, nulls, different data types, and invalid tests. Moreover, there can be kernel-specific tests. For example, for arithmetic kernels, tests should include `NaN` and `Inf` inputs. The test files are located alongside the kernel source files and suffixed with `_test`. Tests are grouped by compute function `kind` and categories.
+
+`TYPED_TEST(test suite name, compute function)` - wrapper to define tests for the given compute function. The `test suite name` is associated with a set of data types that are used for the test suite (`TYPED_TEST_SUITE`). Tests from multiple compute functions can be placed in the same test suite. For example, `TYPED_TEST(TestBinaryArithmeticFloating, Sub)` and `TYPED_TEST(TestBinaryArithmeticFloating, Mul)`.
+


I think a simple (un-templated, non-suite) C++ test case is a necessary code snippet. For example,

// scalar_arithmetic_test.cc TEST(AbsoluteValue, IntegralInputs) { for (auto type : {int8(), int16(), int32(), int64()}) { CheckScalarUnary("absolute_value", int8(), "[]", int8(), "[]"); CheckScalarUnary("absolute_value", int8(), "[0, -1, 1, 2, -2, 3, -3]", int8(), "[0, 1, 1, 2, 2, 3, 3]"); } }

Note that the above compiles and executes before adding the absolute_value function (IMHO it's a useful clarification of intended behavior to add a failing test like this as early as possible in the addition of a new function).

Shall we also add untyped test fixtures (TEST_Fs)?

May also want to link to the primer documentation, as the different types of tests are a source for confusion, as indicated in https://stackoverflow.com/questions/58600728/what-is-the-difference-between-test-test-f-and-test-p

Co-authored-by: Benjamin Kietzman <[email protected]>

nirandaperera · 2021-06-03T20:42:30Z

This is a sort of confusion I had when I first started writing kernels.
"A 'scalar' is a single (non-array) element! But how come "Scalar functions" accept and produce arrays?"
But now I understand, even though arrays are passed, the function is applied on each scalar in the array independently.
Do you this is something we'd want to explicitly discuss in the doc?
May be use an alternative jargon like, "element-wise and vector functions"?

nirandaperera · 2021-06-03T20:33:17Z

docs/source/cpp/authoring_compute_functions.rst

+* Containment tests
+* Structural transforms
+* Conversions
+


Shall we add the comment by @pitrou in Zulip here.

Simple questions for whether a function is a scalar function: - Do all inputs have the same (broadcasted) length? - Does the Nth element in the output only depend on the Nth element of each input?

nirandaperera · 2021-06-03T21:00:55Z

docs/source/cpp/authoring_compute_functions.rst

+* compute/kernels/scalar_arithmetic.cc - contains kernel definitions for "Scalar Arithmetic" compute functions. Kernel definitions are defined via a class with literal name of compute function and containing methods named `Call` that are parameterized for specific input types (signed/unsigned integer and floating-point).
+    * For compute functions that may trigger overflow the "checked" variant is a class suffixed with `Checked` and makes use of assertions and overflow checks. If overflow occurs, kernel returns zero and sets that `Status*` error flag.
+        * For compute functions that do not have a valid mathematical operation for specific datatypes (e.g., negate an unsigned integer), the kernel for those types is provided but should trigger an error with `DCHECK(false) << This is included only for the purposes of instantiability from the "arithmetic kernel generator"` and return zero.
+


I think we should discuss the guarantees provided by the compute infrastructure as well.
ex: for scalar functions, if multiple arrays are passed, the compute infrastructure checks for nullity, guarantees that they are of same size, etc

This would be helpful.

nirandaperera · 2021-06-03T21:49:09Z

docs/source/cpp/authoring_compute_functions.rst

+`TYPED_TEST(test suite name, compute function)` - wrapper to define tests for the given compute function. The `test suite name` is associated with a set of data types that are used for the test suite (`TYPED_TEST_SUITE`). Tests from multiple compute functions can be placed in the same test suite. For example, `TYPED_TEST(TestBinaryArithmeticFloating, Sub)` and `TYPED_TEST(TestBinaryArithmeticFloating, Mul)`.
+
+Helpers
+=======


I think it'd be nicer if we could discuss about some helper methods in codegen_internal.h.

nirandaperera · 2021-06-03T21:52:14Z

docs/source/cpp/authoring_compute_functions.rst

+Arrow uses Google test framework. All kernels should have tests to ensure stability of the compute layer. Tests should at least cover ordinary inputs, corner cases, extreme values, nulls, different data types, and invalid tests. Moreover, there can be kernel-specific tests. For example, for arithmetic kernels, tests should include `NaN` and `Inf` inputs. The test files are located alongside the kernel source files and suffixed with `_test`. Tests are grouped by compute function `kind` and categories.
+
+`TYPED_TEST(test suite name, compute function)` - wrapper to define tests for the given compute function. The `test suite name` is associated with a set of data types that are used for the test suite (`TYPED_TEST_SUITE`). Tests from multiple compute functions can be placed in the same test suite. For example, `TYPED_TEST(TestBinaryArithmeticFloating, Sub)` and `TYPED_TEST(TestBinaryArithmeticFloating, Mul)`.
+


Shall we also add untyped test fixtures (TEST_Fs)?

bkmgit · 2021-12-29T05:03:16Z

One further consideration is interface design. This seems like it is still being stabilized but guiding principles would be helpful.

bkmgit · 2021-12-29T06:33:31Z

docs/source/cpp/authoring_compute_functions.rst

+
+The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.
+
+Many functions have SQL-like semantics in that they perform element-wise or scalar operations on whole arrays at a time. Other functions are not SQL-like and compute results that may be a different length or whose results depend on the order of the values.


Suggested change

Many functions have SQL-like semantics in that they perform element-wise or scalar operations on whole arrays at a time. Other functions are not SQL-like and compute results that may be a different length or whose results depend on the order of the values.

Many functions have SQL-like semantics in that they perform element-wise or scalar operations on whole arrays at a time with output length the same as the input length. Other functions are not SQL-like and compute results that may be a different length or whose results depend on the order of the values.

Maybe this is clearer

bkmgit · 2021-12-29T06:34:10Z

docs/source/cpp/authoring_compute_functions.rst

+Many functions have SQL-like semantics in that they perform element-wise or scalar operations on whole arrays at a time. Other functions are not SQL-like and compute results that may be a different length or whose results depend on the order of the values.
+
+Terminology:
+* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of types or function behavior options.


Suggested change

* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of types or function behavior options.

* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of input data types or function behavior options.

bkmgit · 2021-12-29T06:42:52Z

docs/source/cpp/authoring_compute_functions.rst

+Compute Functions
+=================
+
+An introduction to compute functions is provided in https://arrow.apache.org/docs/cpp/compute.html.


Suggested change

An introduction to compute functions is provided in https://arrow.apache.org/docs/cpp/compute.html.

An introduction to compute functions is provided in `compute documentation <https://arrow.apache.org/docs/cpp/compute.html>`_ .

bkmgit · 2021-12-29T06:44:14Z

docs/source/cpp/authoring_compute_functions.rst

+
+An introduction to compute functions is provided in https://arrow.apache.org/docs/cpp/compute.html.
+
+The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.


Suggested change

The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.

The `compute submodule <https://https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute>`_ contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.

bkmgit · 2021-12-29T06:45:34Z

docs/source/cpp/authoring_compute_functions.rst

+* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must first select a suitable kernel corresponding to the value types of the inputs is selected.
+* Functions along with their kernel implementations are collected in a "function registry". Given a function name and argument types, we can look up that function and dispatch to a compatible kernel.
+
+[Compute functions](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h) have the following principal attributes:


Suggested change

[Compute functions](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h) have the following principal attributes:

`Compute functions <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h>`_ have the following principal attributes:

bkmgit · 2021-12-29T06:57:10Z

docs/source/cpp/authoring_compute_functions.rst

+* Functions along with their kernel implementations are collected in a "function registry". Given a function name and argument types, we can look up that function and dispatch to a compatible kernel.
+
+[Compute functions](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h) have the following principal attributes:
+* A unique ["name"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4NK5arrow7compute8Function4nameEv) used for function invocation and language bindings


Suggested change

* A unique ["name"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4NK5arrow7compute8Function4nameEv) used for function invocation and language bindings

* A unique :doc:`name <.compute>` used for function invocation and language bindings

Adding a section heading to the approriate part of the compute documentation will give a better link. See restructured text documentation

bkmgit · 2021-12-29T07:00:00Z

docs/source/cpp/authoring_compute_functions.rst

+  which indicates in what context it is valid for use
+    * Input/output [types](https://arrow.apache.org/docs/cpp/compute.html#type-categories) and [shapes](https://arrow.apache.org/docs/cpp/compute.html#input-shapes)
+    * Compute functions can also be further "categorized" based on the type of operation performed. For example, `Scalar Arithmetic` vs `Scalar String`.
+* Compute functions (see [FunctionImpl and subclasses](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h)) contain ["kernels"](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels) which are implementations for specific argument signatures.


Suggested change

* Compute functions (see [FunctionImpl and subclasses](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h)) contain ["kernels"](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels) which are implementations for specific argument signatures.

* Compute functions (see `FunctionImpl and subclasses <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h>`_) contain `"kernels" <https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels>`_ which are implementations for specific argument signatures.

bkmgit · 2021-12-29T07:01:32Z

docs/source/cpp/authoring_compute_functions.rst

+~~~~~~
+
+A function that performs scalar data operations on whole arrays of
+data. Can generally process Array or Scalar values. The size of the


Suggested change

data. Can generally process Array or Scalar values. The size of the

data. Can generally process array or scalar values. The size of the

Do not expect this needs to be capitalized. This is a very helpful explanation.

bkmgit · 2021-12-29T07:01:52Z

docs/source/cpp/authoring_compute_functions.rst

+A function that performs scalar data operations on whole arrays of
+data. Can generally process Array or Scalar values. The size of the
+output will be the same as the size (or broadcasted size, in the case
+of mixing Array and Scalar inputs) of the input.


Suggested change

of mixing Array and Scalar inputs) of the input.

of mixing array and scalar inputs) of the input.

bkmgit · 2021-12-29T07:03:01Z

docs/source/cpp/authoring_compute_functions.rst

+
+https://arrow.apache.org/docs/cpp/compute.html#arithmetic-functions
+
+**Categories of Scalar functions**


Suggested change

**Categories of Scalar functions**

**Categories of Scalar Functions**

bkmgit · 2021-12-29T07:04:00Z

docs/source/cpp/authoring_compute_functions.rst

+    * predicates
+    * transforms
+    * trimming
+    * splitting
+    * extraction
+* Containment tests
+* Structural transforms


Suggested change

* predicates

* transforms

* trimming

* splitting

* extraction

* Containment tests

* Structural transforms

* Predicates

* Transforms

* Trimming

* Splitting

* Extraction

* Containment Tests

* Structural Transforms

More consistent capitalization

bkmgit · 2021-12-29T07:07:02Z

docs/source/cpp/authoring_compute_functions.rst

+A function with array input and output whose behavior depends on the
+values of the entire arrays passed, rather than the value of each scalar value.


Suggested change

A function with array input and output whose behavior depends on the

values of the entire arrays passed, rather than the value of each scalar value.

A function with array input and output whose behavior depends on combinations of

values at different locations in the input arrays, rather than the independent computations

on scalar values at the same location in the input arrays.

bkmgit · 2021-12-29T07:08:53Z

docs/source/cpp/authoring_compute_functions.rst

+**Categories of Vector functions**
+
+* Associative transforms
+* Selections
+* Sorts and partitions
+* Structural transforms
+
+
+Scalar aggregate
+~~~~~~~~~~~~~~~~
+
+A function that computes scalar summary statistics from array input.
+
+### Hash aggregate


Suggested change

**Categories of Vector functions**

* Associative transforms

* Selections

* Sorts and partitions

* Structural transforms

Scalar aggregate

~~~~~~~~~~~~~~~~

A function that computes scalar summary statistics from array input.

### Hash aggregate

**Categories of Vector Functions**

* Associative Transforms

* Selections

* Sorts and Partitions

* Structural Transforms

Scalar Aggregate

~~~~~~~~~~~~~~~~

A function that computes scalar summary statistics from array input.

### Hash Aggregate

Consistent capitalization

bkmgit · 2021-12-29T07:09:42Z

docs/source/cpp/authoring_compute_functions.rst

+Kernels
+-------
+
+Kernels are simple ``structs`` containing only function pointers (the "methods" of the kernel) and attribute flags. Each function kind corresponds to a class of Kernel with methods representing each stage of the function's execution. For example, :struct:`ScalarKernel` includes (optionally) :member:`ScalarKernel::init` to initialize any state necessary for execution and :member:`ScalarKernel::exec` to perform the computation.


Suggested change

Kernels are simple ``structs`` containing only function pointers (the "methods" of the kernel) and attribute flags. Each function kind corresponds to a class of Kernel with methods representing each stage of the function's execution. For example, :struct:`ScalarKernel` includes (optionally) :member:`ScalarKernel::init` to initialize any state necessary for execution and :member:`ScalarKernel::exec` to perform the computation.

Kernels are simple ``structs`` containing only function pointers (the "methods" of the kernel) and attribute flags. Each function kind corresponds to a class of kernel with methods representing each stage of the function's execution. For example, :struct:`ScalarKernel` includes (optionally) :member:`ScalarKernel::init` to initialize any state necessary for execution and :member:`ScalarKernel::exec` to perform the computation.

bkmgit · 2021-12-29T07:11:17Z

docs/source/cpp/authoring_compute_functions.rst

+
+Kernels are simple ``structs`` containing only function pointers (the "methods" of the kernel) and attribute flags. Each function kind corresponds to a class of Kernel with methods representing each stage of the function's execution. For example, :struct:`ScalarKernel` includes (optionally) :member:`ScalarKernel::init` to initialize any state necessary for execution and :member:`ScalarKernel::exec` to perform the computation.
+
+Since many kernels are closely related in operation and differ only in their input types, it's frequently useful to leverage c++'s powerful template system to efficiently generate kernels' methods. For example, the "add" compute function accepts all numeric types and its kernels' methods are instantiations of the same function template.


Suggested change

Since many kernels are closely related in operation and differ only in their input types, it's frequently useful to leverage c++'s powerful template system to efficiently generate kernels' methods. For example, the "add" compute function accepts all numeric types and its kernels' methods are instantiations of the same function template.

Since many kernels are closely related in operation and differ only in their input types, it's frequently useful to leverage C++'s powerful template system to efficiently generate kernel methods. For example, the "add" compute function accepts all numeric types and its kernels' methods are instantiations of the same function template.

Maybe a link to an online guide to templating in C++ as used in Arrow would be helpful.

bkmgit · 2021-12-29T07:17:12Z

docs/source/cpp/authoring_compute_functions.rst

+* arrow/util/int_util_internal.h - defines utility functions
+    * Function definitions suffixed with `WithOverflow` to support "safe math" for arithmetic kernels. Helper macros are included to create the definitions which invoke the corresponding operation in [`portable_snippets`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/portable-snippets/safe-math.h) library.


Suggested change

* arrow/util/int_util_internal.h - defines utility functions

* Function definitions suffixed with `WithOverflow` to support "safe math" for arithmetic kernels. Helper macros are included to create the definitions which invoke the corresponding operation in [`portable_snippets`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/portable-snippets/safe-math.h) library.

* `arrow/util/int_util_internal.h <https://github.com/apache/arrow/tree/master/cpp/src/arrow/util/int_util_internal.h>`_ - defines utility functions

* Function definitions suffixed with `WithOverflow` to support "safe math" for arithmetic kernels. Helper macros are included to create the definitions which invoke the corresponding operation in the `"portable_snippets" <https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/portable-snippets/safe-math.h>`_ library.

Links to the files are helpful.

bkmgit · 2021-12-29T07:22:09Z

docs/source/cpp/authoring_compute_functions.rst

+* arrow/util/int_util_internal.h - defines utility functions
+    * Function definitions suffixed with `WithOverflow` to support "safe math" for arithmetic kernels. Helper macros are included to create the definitions which invoke the corresponding operation in [`portable_snippets`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/portable-snippets/safe-math.h) library.
+
+* compute/api_scalar.h - contains


Suggested change

* compute/api_scalar.h - contains

* `arrow/compute/api_scalar.h <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/api_scalar.h>`_ - contains

Consistent abbreviated path names make readability easier.

bkmgit · 2021-12-29T07:24:55Z

docs/source/cpp/authoring_compute_functions.rst

+* compute/api_scalar.h - contains
+    * Subclasses of `FunctionOptions` for specific categories of compute functions
+    * API/prototypes for all `Scalar` compute functions. Note that there is a single API version for each compute function.
+* *compute/api_scalar.cc* - defines `Scalar` compute functions as wrappers over ["CallFunction"](https://arrow.apache.org/docs/cpp/api/compute.html?highlight=one%20shot#_CPPv412CallFunctionRKNSt6stringERKNSt6vectorI5DatumEEPK15FunctionOptionsP11ExecContext) (one-shot function). Arrow provides macros to easily define compute functions based on their `arity` and invocation mode.


Suggested change

* *compute/api_scalar.cc* - defines `Scalar` compute functions as wrappers over ["CallFunction"](https://arrow.apache.org/docs/cpp/api/compute.html?highlight=one%20shot#_CPPv412CallFunctionRKNSt6stringERKNSt6vectorI5DatumEEPK15FunctionOptionsP11ExecContext) (one-shot function). Arrow provides macros to easily define compute functions based on their `arity` and invocation mode.

* `arrow/compute/api_scalar.cc <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/api_scalar.cc>`_ - defines `Scalar` compute functions as wrappers over ["CallFunction"](https://arrow.apache.org/docs/cpp/api/compute.html?highlight=one%20shot#_CPPv412CallFunctionRKNSt6stringERKNSt6vectorI5DatumEEPK15FunctionOptionsP11ExecContext) (one-shot function). Arrow provides macros to easily define compute functions based on their `arity` and invocation mode.

bkmgit · 2021-12-29T07:28:23Z

docs/source/cpp/authoring_compute_functions.rst

+    * API/prototypes for all `Scalar` compute functions. Note that there is a single API version for each compute function.
+* *compute/api_scalar.cc* - defines `Scalar` compute functions as wrappers over ["CallFunction"](https://arrow.apache.org/docs/cpp/api/compute.html?highlight=one%20shot#_CPPv412CallFunctionRKNSt6stringERKNSt6vectorI5DatumEEPK15FunctionOptionsP11ExecContext) (one-shot function). Arrow provides macros to easily define compute functions based on their `arity` and invocation mode.
+    * Macros of the form `SCALAR_EAGER_*` invoke `CallFunction` directly and only require one function name.
+    * Macros of the form `SCALAR_*` invoke `CallFunction` after checking for overflow and require two function names (default and `_checked` variant).


Suggested change

* Macros of the form `SCALAR_*` invoke `CallFunction` after checking for overflow and require two function names (default and `_checked` variant).

* Macros of the form `SCALAR_*` invoke `CallFunction` require two function names, default which behaves like `SCALAR_EAGER_*`, and `_checked` variant which checks for overflow in the calculations.

Underflow may also be problematic.

bkmgit · 2021-12-29T07:31:29Z

docs/source/cpp/authoring_compute_functions.rst

+    * Macros of the form `SCALAR_EAGER_*` invoke `CallFunction` directly and only require one function name.
+    * Macros of the form `SCALAR_*` invoke `CallFunction` after checking for overflow and require two function names (default and `_checked` variant).
+
+* compute/kernels/scalar_arithmetic.cc - contains kernel definitions for "Scalar Arithmetic" compute functions. Kernel definitions are defined via a class with literal name of compute function and containing methods named `Call` that are parameterized for specific input types (signed/unsigned integer and floating-point).


Suggested change

* compute/kernels/scalar_arithmetic.cc - contains kernel definitions for "Scalar Arithmetic" compute functions. Kernel definitions are defined via a class with literal name of compute function and containing methods named `Call` that are parameterized for specific input types (signed/unsigned integer and floating-point).

* `arrow/compute/kernels/scalar_arithmetic.cc <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc>`_ - contains kernel definitions for "Scalar Arithmetic" compute functions. Kernel definitions are defined via a class with literal name of compute function and containing methods named `Call` that are parameterized for specific input types (signed/unsigned integer and floating-point).

bkmgit · 2021-12-29T07:37:25Z

docs/source/cpp/authoring_compute_functions.rst

+Kernel dispatcher
+-----------------
+
+* compute/exec.h


Suggested change

* compute/exec.h

* `arrow/compute/exec.h <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/exec.hc>`_

bkmgit · 2021-12-29T07:43:08Z

docs/source/cpp/authoring_compute_functions.rst

+Register kernels of compute function
+------------------------------------
+
+1. Before registering the kernels, check that the available kernel generators support the `arity` and data types allowed for the new compute function. Kernel generators are not of the same form for all the kernel `kinds`. For example, in the "Scalar Arithmetic" kernels, registration functions have names of the form `MakeArithmeticFunction` and `MakeArithmeticFunctionNotNull`. If not available, you will need to define them for your particular case.


A description of NotNull and other variants may be helpful, though this could also go in the main user documentation since it has performance and correctness implications.

bkmgit · 2021-12-29T07:43:47Z

docs/source/cpp/authoring_compute_functions.rst

+
+1. Before registering the kernels, check that the available kernel generators support the `arity` and data types allowed for the new compute function. Kernel generators are not of the same form for all the kernel `kinds`. For example, in the "Scalar Arithmetic" kernels, registration functions have names of the form `MakeArithmeticFunction` and `MakeArithmeticFunctionNotNull`. If not available, you will need to define them for your particular case.
+
+1. Create the kernels by invoking the kernel generators.


This is vague. Add a link to an example kernel.

Maybe indicate that example code links are provided in a later section.

bkmgit · 2021-12-29T07:44:34Z

docs/source/cpp/authoring_compute_functions.rst

+
+1. Create the kernels by invoking the kernel generators.
+
+1. Register the kernels in the corresponding registry along with its `FunctionDoc`.


A link would be helpful for this as well. Perhaps choose one kernel and use it to illustrate the points that are made.

Maybe indicate that example code links are provided in a later section.

bkmgit · 2021-12-29T07:58:43Z

docs/source/cpp/authoring_compute_functions.rst

+1. Input/output types: Numerical (signed and unsigned, integral and floating-point)
+1. Input/output shapes: operate on scalars or element-wise for arrays


Suggested change

1. Input/output types: Numerical (signed and unsigned, integral and floating-point)

1. Input/output shapes: operate on scalars or element-wise for arrays

1. Input types: Numerical (signed and unsigned, integral and floating-point)

1. Input shapes: Operate on scalars or element-wise for arrays

1. Output types: Numerical, same as input

1. Output shapes: Same as input shape

The documentation does not combine Input and Output,
which improves clarity because dependence of output on input can be made clear.

bkmgit · 2021-12-29T08:18:30Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+struct AbsoluteValue {
+  template <typename T, typename Arg>
+  static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status*) {
+    return (arg < static_cast<T>(0)) ? -arg : arg;
+  }
+
+  template <typename T, typename Arg>
+  static constexpr enable_if_unsigned_integer<T> Call(KernelContext*, Arg arg, Status*) {
+    return arg;
+  }
+
+  template <typename T, typename Arg>
+  static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {
+    return (arg < static_cast<T>(0)) ? arrow::internal::SafeSignedNegate(arg) : arg;
+  }
+};
+
+struct AbsoluteValueChecked {
+  template <typename T, typename Arg>
+  static enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {
+    static_assert(std::is_same<T, Arg>::value, "");
+    if (arg < static_cast<T>(0)) {
+        T result = 0;
+        if (ARROW_PREDICT_FALSE(NegateWithOverflow(arg, &result))) {
+          *st = Status::Invalid("overflow");
+        }
+        return result;
+    }
+    return arg;
+  }
+


Suggested change

```C++

struct AbsoluteValue {

template <typename T, typename Arg>

static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status*) {

return (arg < static_cast<T>(0)) ? -arg : arg;

}

template <typename T, typename Arg>

static constexpr enable_if_unsigned_integer<T> Call(KernelContext*, Arg arg, Status*) {

return arg;

}

template <typename T, typename Arg>

static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {

return (arg < static_cast<T>(0)) ? arrow::internal::SafeSignedNegate(arg) : arg;

}

};

struct AbsoluteValueChecked {

template <typename T, typename Arg>

static enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

if (arg < static_cast<T>(0)) {

T result = 0;

if (ARROW_PREDICT_FALSE(NegateWithOverflow(arg, &result))) {

*st = Status::Invalid("overflow");

}

return result;

}

return arg;

}

.. code-block:: cpp

struct AbsoluteValue {

template <typename T, typename Arg>

static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status*) {

return (arg < static_cast<T>(0)) ? -arg : arg;

}

template <typename T, typename Arg>

static constexpr enable_if_unsigned_integer<T> Call(KernelContext*, Arg arg, Status*) {

return arg;

}

template <typename T, typename Arg>

static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {

return (arg < static_cast<T>(0)) ? arrow::internal::SafeSignedNegate(arg) : arg;

}

};

struct AbsoluteValueChecked {

template <typename T, typename Arg>

static enable_if_signed_integer<T> Call(KernelContext*, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

if (arg < static_cast<T>(0)) {

T result = 0;

if (ARROW_PREDICT_FALSE(NegateWithOverflow(arg, &result))) {

*st = Status::Invalid("overflow");

}

return result;

}

return arg;

}

Use a restructured text code block

bkmgit · 2021-12-29T08:20:13Z

docs/source/cpp/authoring_compute_functions.rst

+  template <typename T, typename Arg>
+  static enable_if_unsigned_integer<T> Call(KernelContext* ctx, Arg arg, Status* st) {
+    static_assert(std::is_same<T, Arg>::value, "");
+    return arg;
+  }
+
+  template <typename T, typename Arg>
+  static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status* st) {
+    static_assert(std::is_same<T, Arg>::value, "");
+    return (arg < static_cast<T>(0)) ? -arg : arg;
+  }
+};
+```


Suggested change

template <typename T, typename Arg>

static enable_if_unsigned_integer<T> Call(KernelContext* ctx, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

return arg;

}

template <typename T, typename Arg>

static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

return (arg < static_cast<T>(0)) ? -arg : arg;

}

};

```

template <typename T, typename Arg>

static enable_if_unsigned_integer<T> Call(KernelContext* ctx, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

return arg;

}

template <typename T, typename Arg>

static constexpr enable_if_floating_point<T> Call(KernelContext*, Arg arg, Status* st) {

static_assert(std::is_same<T, Arg>::value, "");

return (arg < static_cast<T>(0)) ? -arg : arg;

}

};

Continuing formatting to a restructured text code block.

bkmgit · 2021-12-29T08:24:24Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+const FunctionDoc absolute_value_doc{"Calculate the absolute value of the argument element-wise",
+                             ("Results will wrap around on integer overflow.\n"
+                              "Use function \"absolute_value_checked\" if you want overflow\n"
+                              "to return an error."),
+                             {"x"}};
+
+const FunctionDoc absolute_value_checked_doc{
+    "Calculate the absolute value of the argument element-wise",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"absolute_value_checked\"."),
+    {"x"}};
+```


Suggested change

```C++

const FunctionDoc absolute_value_doc{"Calculate the absolute value of the argument element-wise",

("Results will wrap around on integer overflow.\n"

"Use function \"absolute_value_checked\" if you want overflow\n"

"to return an error."),

{"x"}};

const FunctionDoc absolute_value_checked_doc{

"Calculate the absolute value of the argument element-wise",

("This function returns an error on overflow. For a variant that\n"

"doesn't fail on overflow, use function \"absolute_value_checked\"."),

{"x"}};

```

.. code-block:: cpp

const FunctionDoc absolute_value_doc{"Calculate the absolute value of the argument element-wise",

("Results will wrap around on integer overflow.\n"

"Use function \"absolute_value_checked\" if you want overflow\n"

"to return an error."),

{"x"}};

const FunctionDoc absolute_value_checked_doc{

"Calculate the absolute value of the argument element-wise",

("This function returns an error on overflow. For a variant that\n"

"doesn't fail on overflow, use function \"absolute_value_checked\"."),

{"x"}};

```

Use restructured text code block

bkmgit · 2021-12-29T08:27:23Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+  auto absolute_value = MakeUnaryArithmeticFunction<AbsoluteValue>("absolute_value", &absolute_value_doc);
+  auto absolute_value_checked = MakeUnaryArithmeticFunctionNotNull<AbsoluteValueChecked>("absolute_value_checked", &absolute_value_checked_doc);
+```


Suggested change

```C++

auto absolute_value = MakeUnaryArithmeticFunction<AbsoluteValue>("absolute_value", &absolute_value_doc);

auto absolute_value_checked = MakeUnaryArithmeticFunctionNotNull<AbsoluteValueChecked>("absolute_value_checked", &absolute_value_checked_doc);

```

.. code-block:: cpp

auto absolute_value = MakeUnaryArithmeticFunction<AbsoluteValue>("absolute_value", &absolute_value_doc);

auto absolute_value_checked = MakeUnaryArithmeticFunctionNotNull<AbsoluteValueChecked>("absolute_value_checked", &absolute_value_checked_doc);

Use restructured text code block

bkmgit · 2021-12-29T08:28:23Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+  DCHECK_OK(registry->AddFunction(std::move(absolute_value)));
+  DCHECK_OK(registry->AddFunction(std::move(absolute_value_checked)));
+```


Suggested change

```C++

DCHECK_OK(registry->AddFunction(std::move(absolute_value)));

DCHECK_OK(registry->AddFunction(std::move(absolute_value_checked)));

```

.. code-block:: cpp

DCHECK_OK(registry->AddFunction(std::move(absolute_value)));

DCHECK_OK(registry->AddFunction(std::move(absolute_value_checked)));

Use restructured text code block

bkmgit · 2021-12-29T08:30:12Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+ARROW_EXPORT
+Result<Datum> Hypotenuse(const Datum& arg, ArithmeticOptions options = ArithmeticOptions(), ExecContext* ctx = NULLPTR);
+```


Suggested change

```C++

ARROW_EXPORT

Result<Datum> Hypotenuse(const Datum& arg, ArithmeticOptions options = ArithmeticOptions(), ExecContext* ctx = NULLPTR);

```

.. code-block:: cpp

ARROW_EXPORT

Result<Datum> Hypotenuse(const Datum& arg, ArithmeticOptions options = ArithmeticOptions(),

ExecContext* ctx = NULLPTR);

Use restructured text code block

bkmgit · 2021-12-29T08:31:16Z

docs/source/cpp/authoring_compute_functions.rst

+```C++
+SCALAR_ARITHMETIC_BINARY(Hypotenuse, "hypotenuse", "hypotenuse_checked")
+```


Suggested change

```C++

SCALAR_ARITHMETIC_BINARY(Hypotenuse, "hypotenuse", "hypotenuse_checked")

```

.. code-block:: cpp

SCALAR_ARITHMETIC_BINARY(Hypotenuse, "hypotenuse", "hypotenuse_checked")

Use restructured text code block

edponce · 2021-12-29T21:16:16Z

@bkmgit Thanks for your reviews! I will get back to this PR and resolve them.

9prady9

I think an image showing what goes where could be also a quick way to protray the hierarchy and how calls are made when a function is invoked.

9prady9 · 2022-03-01T05:49:44Z

docs/source/cpp/authoring_compute_functions.rst

+
+Terminology:
+* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of types or function behavior options.
+* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must first select a suitable kernel corresponding to the value types of the inputs is selected.


Suggested change

* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must first select a suitable kernel corresponding to the value types of the inputs is selected.

* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must ensure a kernel corresponding to the value types of the inputs is selected.

or

Suggested change

* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must first select a suitable kernel corresponding to the value types of the inputs is selected.

* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must select a kernel corresponding to the value types of the inputs.

?

9prady9 · 2022-03-01T05:51:36Z

docs/source/cpp/authoring_compute_functions.rst

+[Compute functions](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h) have the following principal attributes:
+* A unique ["name"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4NK5arrow7compute8Function4nameEv) used for function invocation and language bindings
+* A ["kind"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute8Function4KindE)
+  which indicates in what context it is valid for use


Suggested change

which indicates in what context it is valid for use

indicates in what context it is valid for use

9prady9 · 2022-03-01T05:52:29Z

docs/source/cpp/authoring_compute_functions.rst

+    * Input/output [types](https://arrow.apache.org/docs/cpp/compute.html#type-categories) and [shapes](https://arrow.apache.org/docs/cpp/compute.html#input-shapes)
+    * Compute functions can also be further "categorized" based on the type of operation performed. For example, `Scalar Arithmetic` vs `Scalar String`.
+* Compute functions (see [FunctionImpl and subclasses](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h)) contain ["kernels"](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels) which are implementations for specific argument signatures.
+* An ["arity"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute5ArityE) which states the number of required arguments


Suggested change

* An ["arity"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute5ArityE) which states the number of required arguments

* An ["arity"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute5ArityE) states the number of required arguments

9prady9 · 2022-03-01T06:03:46Z

docs/source/cpp/authoring_compute_functions.rst

+Files and structures of the computer layer
+==========================================
+
+This section describes the general structure of files/directory and principal code structures of the compute layer.


Suggested change

This section describes the general structure of files/directory and principal code structures of the compute layer.

This section describes the general structure of files/directory and principal code structures of the compute layer using scalar function <pick something>.

followed by bullet points on how this function exists in code base. The raw information is present even now in the current list of points, but it feels like a flow of information is missing.

pitrou · 2022-05-04T14:56:38Z

@edponce Are you planning to work on this in the future?

edponce · 2022-05-10T18:05:01Z

@pitrou Wow, I do not know why I extended this PR so much. I will take tomorrow morning to resolve all the issues and present a full draft.

edponce · 2022-05-10T18:07:39Z

Linking as reference this general outline for adding compute functions.

edponce · 2022-06-28T19:18:10Z

Here are current instructions for authoring a compute kernel.

drin · 2022-08-20T07:12:48Z

I plan on subsuming this PR into a new one (#13933). I will try to make sure all of these reviews are accommodated.

drin · 2022-08-24T16:52:15Z

I accommodated most of these review comments in this commit: f0e3adf

@edponce I'm not sure what we should do with this PR, but I think it can be closed?

copied local markdown doc to Arrow docs (no formatting)

ab04764

edponce changed the title ~~[C++] Add documentation for authoring compute kernels~~ ARROW-12724: [C++] Add documentation for authoring compute kernels May 11, 2021

pitrou reviewed May 12, 2021

View reviewed changes

bkietz requested changes May 19, 2021

View reviewed changes

update definition of kernels

381bdd0

Co-authored-by: Benjamin Kietzman <[email protected]>

nirandaperera reviewed Jun 3, 2021

View reviewed changes

bkmgit reviewed Dec 29, 2021

View reviewed changes

9prady9 reviewed Mar 1, 2022

View reviewed changes

drin mentioned this pull request Aug 20, 2022

ARROW-12724: [C++][Docs] Add documentation for authoring compute kernels #13933

Draft

asfimport mentioned this pull request Dec 7, 2022

[C++][Docs] Add documentation for authoring compute kernels #18663

Open

edponce closed this Apr 26, 2023


		An introduction to compute functions is provided in https://arrow.apache.org/docs/cpp/compute.html.

		The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.


		A function that computes scalar summary statistics from array input.

		### Hash aggregate

		Arrow uses Google test framework. All kernels should have tests to ensure stability of the compute layer. Tests should at least cover ordinary inputs, corner cases, extreme values, nulls, different data types, and invalid tests. Moreover, there can be kernel-specific tests. For example, for arithmetic kernels, tests should include `NaN` and `Inf` inputs. The test files are located alongside the kernel source files and suffixed with `_test`. Tests are grouped by compute function `kind` and categories.

		`TYPED_TEST(test suite name, compute function)` - wrapper to define tests for the given compute function. The `test suite name` is associated with a set of data types that are used for the test suite (`TYPED_TEST_SUITE`). Tests from multiple compute functions can be placed in the same test suite. For example, `TYPED_TEST(TestBinaryArithmeticFloating, Sub)` and `TYPED_TEST(TestBinaryArithmeticFloating, Mul)`.


		The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.

		Many functions have SQL-like semantics in that they perform element-wise or scalar operations on whole arrays at a time. Other functions are not SQL-like and compute results that may be a different length or whose results depend on the order of the values.

	* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of types or function behavior options.
	* The term compute "function" refers to a particular general operation that may have many different implementations corresponding to different combinations of input data types or function behavior options.

	The [compute submodule](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute) contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.
	The `compute submodule <https://https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute>`_ contains analytical functions that process primarily columnar data for either scalar or Arrow-based array inputs. These are intended for use inside query engines, data frame libraries, etc.

	[Compute functions](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h) have the following principal attributes:
	`Compute functions <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h>`_ have the following principal attributes:

	* A unique ["name"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4NK5arrow7compute8Function4nameEv) used for function invocation and language bindings
	* A unique :doc:`name <.compute>` used for function invocation and language bindings

	* Compute functions (see [FunctionImpl and subclasses](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h)) contain ["kernels"](https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels) which are implementations for specific argument signatures.
	* Compute functions (see `FunctionImpl and subclasses <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/function.h>`_) contain `"kernels" <https://github.com/edponce/arrow/tree/master/cpp/src/arrow/compute/kernels>`_ which are implementations for specific argument signatures.

	data. Can generally process Array or Scalar values. The size of the
	data. Can generally process array or scalar values. The size of the

		Define compute function
		-----------------------

	of mixing Array and Scalar inputs) of the input.
	of mixing array and scalar inputs) of the input.


		https://arrow.apache.org/docs/cpp/compute.html#arithmetic-functions

		Categories of Scalar functions

		A function with array input and output whose behavior depends on the
		values of the entire arrays passed, rather than the value of each scalar value.

-A function with array input and output whose behavior depends on the
-values of the entire arrays passed, rather than the value of each scalar value.
+A function with array input and output whose behavior depends on combinations of
+values at different locations in the input arrays, rather than the independent computations
+on scalar values at the same location in the input arrays.


		Kernels are simple ``structs`` containing only function pointers (the "methods" of the kernel) and attribute flags. Each function kind corresponds to a class of Kernel with methods representing each stage of the function's execution. For example, :struct:`ScalarKernel` includes (optionally) :member:`ScalarKernel::init` to initialize any state necessary for execution and :member:`ScalarKernel::exec` to perform the computation.

		Since many kernels are closely related in operation and differ only in their input types, it's frequently useful to leverage c++'s powerful template system to efficiently generate kernels' methods. For example, the "add" compute function accepts all numeric types and its kernels' methods are instantiations of the same function template.

		* arrow/util/int_util_internal.h - defines utility functions
		* Function definitions suffixed with `WithOverflow` to support "safe math" for arithmetic kernels. Helper macros are included to create the definitions which invoke the corresponding operation in [`portable_snippets`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/portable-snippets/safe-math.h) library.

	* compute/api_scalar.h - contains
	* `arrow/compute/api_scalar.h <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/api_scalar.h>`_ - contains

	* Macros of the form `SCALAR_*` invoke `CallFunction` after checking for overflow and require two function names (default and `_checked` variant).
	* Macros of the form `SCALAR_` invoke `CallFunction` require two function names, default which behaves like `SCALAR_EAGER_`, and `_checked` variant which checks for overflow in the calculations.

	* compute/exec.h
	* `arrow/compute/exec.h <https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/exec.hc>`_


		1. Before registering the kernels, check that the available kernel generators support the `arity` and data types allowed for the new compute function. Kernel generators are not of the same form for all the kernel `kinds`. For example, in the "Scalar Arithmetic" kernels, registration functions have names of the form `MakeArithmeticFunction` and `MakeArithmeticFunctionNotNull`. If not available, you will need to define them for your particular case.

		1. Create the kernels by invoking the kernel generators.

ARROW-12724: [C++] Add documentation for authoring compute kernels #10296

ARROW-12724: [C++] Add documentation for authoring compute kernels #10296

Conversation

edponce commented May 11, 2021

github-actions bot commented May 11, 2021

github-actions bot commented May 11, 2021

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkietz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirandaperera commented Jun 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmgit commented Dec 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmgit Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmgit Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkmgit Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edponce commented Dec 29, 2021

9prady9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented May 4, 2022

nirandaperera commented Jun 3, 2021 •

edited

Loading

bkmgit Dec 29, 2021 •

edited

Loading

bkmgit Dec 29, 2021 •

edited

Loading

bkmgit Dec 29, 2021 •

edited

Loading


		1. Create the kernels by invoking the kernel generators.

		1. Register the kernels in the corresponding registry along with its `FunctionDoc`.

		1. Input/output types: Numerical (signed and unsigned, integral and floating-point)
		1. Input/output shapes: operate on scalars or element-wise for arrays

-. Input/output types: Numerical (signed and unsigned, integral and floating-point)
-. Input/output shapes: operate on scalars or element-wise for arrays
+. Input types: Numerical (signed and unsigned, integral and floating-point)
+. Input shapes: Operate on scalars or element-wise for arrays
+. Output types: Numerical, same as input
+. Output shapes: Same as input shape

	* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must first select a suitable kernel corresponding to the value types of the inputs is selected.
	* A specific implementation of a function is a "kernel". Selecting a viable kernel for executing a function is referred to as "dispatching". When executing a function on inputs, we must ensure a kernel corresponding to the value types of the inputs is selected.

	which indicates in what context it is valid for use
	indicates in what context it is valid for use

	* An ["arity"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute5ArityE) which states the number of required arguments
	* An ["arity"](https://arrow.apache.org/docs/cpp/api/compute.html#_CPPv4N5arrow7compute5ArityE) states the number of required arguments

	This section describes the general structure of files/directory and principal code structures of the compute layer.
	This section describes the general structure of files/directory and principal code structures of the compute layer using scalar function <pick something>.