Replies: 3 comments 2 replies
-
The link is 404 |
Beta Was this translation helpful? Give feedback.
-
Here is a good paper on the high level differences (in the background section): |
Beta Was this translation helpful? Give feedback.
-
I didn't look at your code too closely, but the actual datasource itself also seems to make many allocations As @tustvold said in Discord
Here is some documentation on how to do it: https://datafusion.apache.org/library-user-guide/profiling.html Note that it is possible to reuse the allocations in DataFusion's functions, though most of the built in ones don't do it as we don't normally see allocations as the bottleneck in filter evaluations See the example here: datafusion/datafusion-examples/examples/advanced_udf.rs Lines 203 to 246 in 4d2e06f Most |
Beta Was this translation helpful? Give feedback.
-
When implementing a simple filtering and summation query using Arrow, I observed that the performance fell short of expectations. Compared to the row-oriented implementation, the performance degradation appears to be attributed to additional memory allocations. In contrast, the row-oriented engine demonstrates superior performance as it can avoid deep copying when transferring data between operators.
The experimental codebase is at here.
Beta Was this translation helpful? Give feedback.
All reactions