-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Extend the simple UDAF interface with function-level variables #12067
base: main
Are you sure you want to change the base?
feat: Extend the simple UDAF interface with function-level variables #12067
Conversation
✅ Deploy Preview for meta-velox canceled.
|
9013ee9
to
9775661
Compare
@kagamiori Some time ago, we attempted to add function-level states to the simple UDAF interface in #9167 and we have discussed some details in #8711, but @mbasmanova suggested that we implement it in a manner similar to the simple scalar functions interface. You had previously provided a specific implementation plan, and this PR is implemented according to the plan you proposed at that time. However, I did not add support for constantInput, as we currently only need to use rawInputType and resultType. Adding constantInput would require introducing new variables in the AggregateFunctionFactory, and I believe we can expand it when we actually need it. We currently need this capability to implement the decimal average aggregate function in Spark. Could you help to take a look. cc @rui-mo. |
94af18b
to
8d84155
Compare
Sure, let me take a look. This was the second prototype after getting @mbasmanova's suggestions: #9775. It looks like this PR has the same design 👍. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @liujiayi771, this looks mostly good to me except a few nits. Thank you for extending the simple UDAF interface to improve its usability!
velox/exec/SimpleAggregateAdapter.h
Outdated
TypePtr resultType) | ||
: Aggregate(std::move(resultType)), fn_{std::make_unique<FUNC>()} { | ||
if constexpr (support_initialize_) { | ||
FUNC::initialize(fn_.get(), step, argTypes, resultType_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just fn_->initialize(step, argTypes, resultType_)
;
velox/exec/SimpleAggregateAdapter.h
Outdated
@@ -103,6 +110,7 @@ class SimpleAggregateAdapter : public Aggregate { | |||
// These functions are called on groups of both non-null and null | |||
// accumulators. These functions also return a bool indicating whether the | |||
// current group should be a NULL in the result vector. | |||
std::unique_ptr<FUNC> fn_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move the member variable to the bottom of the class.
using IntermediateType = Row<int64_t, double>; | ||
using OutputType = double; | ||
|
||
TypePtr inputType_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a comment above to say explicitly that these are the function level variable we're going to test, and explain that they are set during the creation of the aggregation function and checked in addInput().
resultType, | ||
name, | ||
valueType); | ||
valueType, | ||
step, | ||
argTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reorder the arguments to be name, step, argType, resultType, valueType
to be more consistent with the lambda's parameter list.
const TypePtr& inputType, | ||
const TypePtr& sumType, | ||
const TypePtr& resultType) { | ||
const TypePtr& resultType, | ||
core::AggregationNode::Step step, | ||
const std::vector<TypePtr>& argTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Reorder the parameter list to be step, argTypes, resultType, inputType, sumType
.
TypePtr inputType; | ||
TypePtr resultType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inputType -> inputType_, resultType -> resultType_
@kagamiori Thank you for your review. All the comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for making the simple UDAF interface support function-level states! I left a few nits.
@@ -248,6 +271,9 @@ For aggregaiton functions of default-null behavior, the author defines an | |||
// Author defines data members | |||
... | |||
|
|||
// Define a pointer to the UDAF class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add in the comment that this is only needed if the aggregation function uses function-level states.
Some function-level variables needs to be declared in the simple aggregation | ||
function class. These variables are initialized once when the aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The author can optionally define function-level variables in the simple aggregation function class. These variables...
function, the author can hold them in the aggregate class variables, and | ||
initialize them in the initialize() method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the author can hold them in the aggregate class variables, and
initialize them in the initialize() method. --> these types can be defined as member variables in the aggregate class and initialized in the initialize() method.
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hi @liujiayi771, could you please rebase the PR onto the latest main? Thanks! |
7a37b60
to
203160a
Compare
Done. |
Some aggregate functions implemented based on the simple UDAF interface need to obtain information such as step, argTypes, and resultType in the AccumulatorType struct. This PR adds the capability for user-defined variables to the simple UDAF interface, supporting the passing of step, argTypes, and resultType to AccumulatorType struct when registering the SimpleAggregateAdapter.