When compared to the row-oriented implementation, Arrow's performance fell short of expectations #14932

jayhan94 · 2025-02-28T10:49:34Z

jayhan94
Feb 28, 2025

When implementing a simple filtering and summation query using Arrow, I observed that the performance fell short of expectations. Compared to the row-oriented implementation, the performance degradation appears to be attributed to additional memory allocations. In contrast, the row-oriented engine demonstrates superior performance as it can avoid deep copying when transferring data between operators.

The experimental codebase is at here.

xudong963 · 2025-02-28T10:59:21Z

xudong963
Feb 28, 2025
Collaborator

The experimental codebase is at here.

The link is 404

1 reply

jayhan94 Feb 28, 2025
Author

I forget to make it public. Now it's ok.

alamb · 2025-03-01T12:12:16Z

alamb
Mar 1, 2025
Collaborator

Here is a good paper on the high level differences (in the background section):
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask

0 replies

alamb · 2025-03-01T12:17:45Z

alamb
Mar 1, 2025
Collaborator

Compared to the row-oriented implementation, the performance degradation appears to be attributed to additional memory allocations.

I didn't look at your code too closely, but the actual datasource itself also seems to make many allocations
https://github.com/jayhan94/arrow-playground/blob/70dc0ee80f0507e9c4e3041addc0f36152cc4700/src/columnar.rs#L27-L50

As @tustvold said in Discord

I'd also recommend using a CPU profiler, e.g. hotspot, to analyse where your application is spending time. From a quick glance the way you are constructing a StringArray will perform a lot of unnecessary allocations

Here is some documentation on how to do it: https://datafusion.apache.org/library-user-guide/profiling.html

Note that it is possible to reuse the allocations in DataFusion's functions, though most of the built in ones don't do it as we don't normally see allocations as the bottleneck in filter evaluations

See the example here:

datafusion/datafusion-examples/examples/advanced_udf.rs

Lines 203 to 246 in 4d2e06f

    
           fn maybe_pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> { 
        
               // Calling `unary` creates a new array for the results. Avoiding 
        
               // allocations is a common optimization in performance critical code. 
        
               // arrow-rs allows this optimization via the `unary_mut` 
        
               // and `binary_mut` kernels in certain cases 
        
               // 
        
               // These kernels can only be used if there are no other references to 
        
               // the arrays (exp_array has to be the last remaining reference). 
        
               let owned_array = exp_array 
        
                   // as in the previous example, we first downcast to &Float64Array 
        
                   .as_primitive::<Float64Type>() 
        
                   // non-obviously, we call clone here to get an owned `Float64Array`. 
        
                   // Calling clone() is relatively inexpensive as it increments 
        
                   // some ref counts but doesn't clone the data) 
        
                   // 
        
                   // Once we have the owned Float64Array we can drop the original 
        
                   // exp_array (untyped) reference 
        
                   .clone(); 
        
               // We *MUST* drop the reference to `exp_array` explicitly so that 
        
               // owned_array is the only reference remaining in this function. 
        
               // 
        
               // Note that depending on the query there may still be other references 
        
               // to the underlying buffers, which would prevent reuse. The only way to 
        
               // know for sure is the result of `compute::unary_mut` 
        
               drop(exp_array); 
        
               // If we have the only reference, compute the result directly into the same 
        
               // allocation as was used for the input array 
        
               match compute::unary_mut(owned_array, |exp| base.powf(exp)) { 
        
                   Err(_orig_array) => { 
        
                       // unary_mut will return the original array if there are other 
        
                       // references into the underling buffer (and thus reuse is 
        
                       // impossible) 
        
                       // 
        
                       // In a real implementation, this case should fall back to 
        
                       // calling `unary` and allocate a new array; In this example 
        
                       // we will return an error for demonstration purposes 
        
                       exec_err!("Could not reuse array for maybe_pow_in_place") 
        
                   } 
        
                   // a result of OK means the operation was run successfully 
        
                   Ok(res) => Ok(Arc::new(res)), 
        
               } 
        
           }

Most

1 reply

jayhan94 Mar 1, 2025
Author

Yes, I've optimized the code, and the arrow implementation now runs 5 times faster. I believe its performance has reached the expected level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When compared to the row-oriented implementation, Arrow's performance fell short of expectations #14932

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

When compared to the row-oriented implementation, Arrow's performance fell short of expectations #14932

jayhan94 Feb 28, 2025

Replies: 3 comments · 2 replies

xudong963 Feb 28, 2025 Collaborator

jayhan94 Feb 28, 2025 Author

alamb Mar 1, 2025 Collaborator

alamb Mar 1, 2025 Collaborator

jayhan94 Mar 1, 2025 Author

jayhan94
Feb 28, 2025

Replies: 3 comments 2 replies

xudong963
Feb 28, 2025
Collaborator

jayhan94 Feb 28, 2025
Author

alamb
Mar 1, 2025
Collaborator

alamb
Mar 1, 2025
Collaborator

jayhan94 Mar 1, 2025
Author