You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The process works fine but consumes a bunch of RAM. I only realised this after getting consistent OOM problems and then added some extra logging statements.
When text files are compressed the polars batch reader decompresses the file in memory and reads it
After digging through some polars issues a good workaround is using pyarrow.csv to stream the CSV
Before we used python's csv module to create pyarrow record batches, which was very slow. pyarrow.csv should be much faster but we'd need to be careful and profile it.
The text was updated successfully, but these errors were encountered:
The process works fine but consumes a bunch of RAM. I only realised this after getting consistent OOM problems and then added some extra logging statements.
When text files are compressed the polars batch reader decompresses the file in memory and reads it
After digging through some polars issues a good workaround is using
pyarrow.csv
to stream the CSVBefore we used python's
csv
module to createpyarrow
record batches, which was very slow.pyarrow.csv
should be much faster but we'd need to be careful and profile it.The text was updated successfully, but these errors were encountered: