Skip to content

0.2.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@andygrove andygrove released this 03 Sep 13:51
· 115 commits to main since this release

DataFusion Comet 0.2.0 Changelog

This release consists of 87 commits from 14 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: dictionary decimal vector optimization #705 (kazuyukitanimura)
  • fix: Unsupported window expression should fall back to Spark #710 (viirya)
  • fix: ReusedExchangeExec can be child operator of CometBroadcastExchangeExec #713 (viirya)
  • fix: Fallback to Spark for window expression with range frame #719 (viirya)
  • fix: Remove skip.surefire.tests mvn property #739 (wForget)
  • fix: subquery execution under CometTakeOrderedAndProjectExec should not fail #748 (viirya)
  • fix: skip negative scale checks for creating decimals #723 (kazuyukitanimura)
  • fix: Fallback to Spark for unsupported partitioning #759 (viirya)
  • fix: Unsupported types for SinglePartition should fallback to Spark #765 (viirya)
  • fix: unwrap dictionaries in CreateNamedStruct #754 (andygrove)
  • fix: Fallback to Spark for unsupported input besides ordering #768 (viirya)
  • fix: Native window operator should be CometUnaryExec #774 (viirya)
  • fix: Fallback to Spark when shuffling on struct with duplicate field name #776 (viirya)
  • fix: withInfo was overwriting information in some cases #780 (andygrove)
  • fix: Improve support for nested structs #800 (eejbyfeldt)
  • fix: Sort on single struct should fallback to Spark #811 (viirya)
  • fix: Check sort order of SortExec instead of child output #821 (viirya)
  • fix: Fix panic in avg aggregate and disable stddev by default #819 (andygrove)
  • fix: Supported nested types in HashJoin #735 (eejbyfeldt)

Performance related:

  • perf: Improve performance of CASE .. WHEN expressions #703 (andygrove)
  • perf: Optimize IfExpr by delegating to CaseExpr #681 (andygrove)
  • fix: optimize isNullAt #732 (kazuyukitanimura)
  • perf: decimal decode improvements #727 (parthchandra)
  • fix: Remove castting on decimals with a small precision to decimal256 #741 (kazuyukitanimura)
  • fix: optimize some bit functions #718 (kazuyukitanimura)
  • fix: Optimize getDecimal for small precision #758 (kazuyukitanimura)
  • perf: add metrics to CopyExec and ScanExec #778 (andygrove)
  • fix: Optimize decimal creation macros #764 (kazuyukitanimura)
  • perf: Improve count aggregate performance #784 (andygrove)
  • fix: Optimize read_side_padding #772 (kazuyukitanimura)
  • perf: Remove some redundant copying of batches #816 (andygrove)
  • perf: Remove redundant copying of batches after FilterExec #835 (andygrove)
  • fix: Optimize CheckOverflow #852 (kazuyukitanimura)
  • perf: Add benchmarks for Spark Scan + Comet Exec #863 (andygrove)

Implemented enhancements:

  • feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704 (akhilss99)
  • feat: Support count AggregateUDF for window function #736 (huaxingao)
  • feat: Implement basic version of RLIKE #734 (andygrove)
  • feat: show executed native plan with metrics when in debug mode #746 (andygrove)
  • feat: Add GetStructField expression #731 (Kimahriman)
  • feat: Add config to enable native upper and lower string conversion #767 (andygrove)
  • feat: Improve native explain #795 (andygrove)
  • feat: Add support for null literal with struct type #797 (eejbyfeldt)
  • feat: Optimze CreateNamedStruct preserve dictionaries #789 (eejbyfeldt)
  • feat: CreateArray support #793 (Kimahriman)
  • feat: Add native thread configs #828 (viirya)
  • feat: Add specific configs for converting Spark Parquet and JSON data to Arrow #832 (andygrove)
  • feat: Support sum in window function #802 (huaxingao)
  • feat: Simplify configs for enabling/disabling operators #855 (andygrove)
  • feat: Enable clippy::clone_on_ref_ptr on proto and spark_exprs crates #859 (comphead)
  • feat: Enable clippy::clone_on_ref_ptr on core crate #860 (comphead)
  • feat: Use CometPlugin as main entrypoint #853 (andygrove)

Documentation updates:

  • doc: Update outdated spark.comet.columnar.shuffle.enabled configuration doc #738 (wForget)
  • docs: Add explicit configs for enabling operators #801 (andygrove)
  • doc: Document CometPlugin to start Comet in cluster mode #836 (comphead)

Other:

  • chore: Make rust clippy happy #701 (Xuanwo)
  • chore: Update version to 0.2.0 and add 0.1.0 changelog #696 (andygrove)
  • chore: Use rust-toolchain.toml for better toolchain support #699 (Xuanwo)
  • chore(native): Make sure all targets in workspace been covered by clippy #702 (Xuanwo)
  • Apache DataFusion Comet Logo #697 (aocsa)
  • chore: Add logo to rat exclude list #709 (andygrove)
  • chore: Use new logo in README and website #724 (andygrove)
  • build: Add Comet logo files into exclude list #726 (viirya)
  • chore: Remove TPC-DS benchmark results #728 (andygrove)
  • chore: make Cast's logic reusable for other projects #716 (Blizzara)
  • chore: move scalar_funcs into spark-expr #712 (Blizzara)
  • chore: Bump DataFusion to rev 35c2e7e #740 (andygrove)
  • chore: add more aggregate functions to benchmark test #706 (huaxingao)
  • chore: Add criterion benchmark for decimal_div #743 (andygrove)
  • build: Re-enable TPCDS q72 for broadcast and hash join configs #781 (viirya)
  • chore: bump DataFusion to rev f4e519f #783 (huaxingao)
  • chore: Upgrade to DataFusion rev bddb641 and disable "skip partial aggregates" feature #788 (andygrove)
  • chore: Remove legacy code for adding a cast to a coalesce #790 (andygrove)
  • chore: Use DataFusion 41.0.0-rc1 #794 (andygrove)
  • chore: rename CometRowToColumnar and fix duplication bug #785 (Kimahriman)
  • chore: Enable shuffle in micro benchmarks #806 (andygrove)
  • Minor: ScanExec code cleanup and additional documentation #804 (andygrove)
  • chore: Make it possible to run 'make benchmark-%' using jvm 17+ #823 (eejbyfeldt)
  • chore: Add more unsupported cases to supportedSortType #825 (viirya)
  • chore: Enable Comet shuffle with AQE coalesce partitions #834 (viirya)
  • chore: Add GitHub workflow to publish Docker image #847 (andygrove)
  • chore: Revert "fix: change the not exists base image apache/spark:3.4.3 to 3.4.2" #854 (haoxins)
  • chore: fix docker-publish attempt 1 #851 (andygrove)
  • minor: stop warning that AQEShuffleRead cannot run natively #842 (andygrove)
  • chore: Improve ObjectHashAggregate fallback error message #849 (andygrove)
  • chore: Fix docker image publishing (specify ghcr.io in tag) #856 (andygrove)
  • chore: Use Git tag as Comet version when publishing Docker images #857 (andygrove)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    36	Andy Grove
    16	Liang-Chi Hsieh
     9	KAZUYUKI TANIMURA
     5	Emil Ejbyfeldt
     4	Huaxin Gao
     3	Adam Binford
     3	Oleks V
     3	Xuanwo
     2	Arttu
     2	Zhen Wang
     1	Akhil S S
     1	Alexander Ocsa
     1	Parth Chandra
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.