You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common pattern in HADES is to do some dplyr query on an Andromeda table, and then call batchApply() on that query object (e.g. by Cyclops). However, the current Andromeda implementation in this scenario always first copies the result of the query into a new Andromeda object before batching. My guess is this is because arrow::ScannerBuilder$create() does not accept a arrow_dplyr_query object.
But I did find the arrow::as_record_batch_reader() function works fine with arrow_dplyr_query objects:
The only downside is you can't set the batch size, although it definitely does batching.
I would propose to use this avoid having to copy the query result into an arrow object (which may eat a lot of resources, and also has the issue on Windows that we can't delete the temp Andromeda object).
The text was updated successfully, but these errors were encountered:
A common pattern in HADES is to do some dplyr query on an Andromeda table, and then call
batchApply()
on that query object (e.g. by Cyclops). However, the current Andromeda implementation in this scenario always first copies the result of the query into a new Andromeda object before batching. My guess is this is becausearrow::ScannerBuilder$create()
does not accept aarrow_dplyr_query
object.But I did find the
arrow::as_record_batch_reader()
function works fine witharrow_dplyr_query
objects:The only downside is you can't set the batch size, although it definitely does batching.
I would propose to use this avoid having to copy the query result into an arrow object (which may eat a lot of resources, and also has the issue on Windows that we can't delete the temp Andromeda object).
The text was updated successfully, but these errors were encountered: