Merge branch-25.02 into main #12128

nvauto · 2025-02-13T07:52:25Z

Change version to 25.02.0

Note: merge this PR with Create a merge commit to merge

Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: #11755 Signed-off-by: nvauto <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

* remove excluded release shim and TODO Signed-off-by: YanxuanLiu <[email protected]> * remove shim from 2.13 properties Signed-off-by: YanxuanLiu <[email protected]> * Fix error: 'NoneType' object has no attribute 'split' for excluded_shims Signed-off-by: timl <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>

Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

…#11778) Signed-off-by: Jason Lowe <[email protected]>

#11788) The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Signed-off-by: timl <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

…ip ci] (#11791) * replace date with jni&private timestamp for cache key Signed-off-by: YanxuanLiu <[email protected]> * use date if quering timestamp failed Signed-off-by: YanxuanLiu <[email protected]> * add bash script to get timestamp Signed-off-by: YanxuanLiu <[email protected]> * replace timestamp with sha1 Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Signed-off-by: YanxuanLiu <[email protected]>

* Add the 'test_type' parameter for Databricks script For fixing: #11818 'nightly' is for nightly CI, 'pre-commit' is for the pre-merge CI the pre-merge CI does not need to copy the Rapids plugin built tar from the Databricks cluster back to the local host, only the nightly build needs to copy the spark-rapids-built.tgz back Signed-off-by: timl <[email protected]> * Update copyright Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>

…ace (#11813) * Support some escape characters in search list when rewriting regexp_replace to string replace Signed-off-by: Haoyang Li <[email protected]> * add a case Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> * update datagen Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

* Fix TrafficController numTasks check Signed-off-by: Jihoon Son <[email protected]> * rename weights properly * simplify the loop condition * Rename the condition variable for readability Co-authored-by: Gera Shegalov <[email protected]> * missing renames * add test for when all tasks are big --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>

* Add support for kudo write metrics * Refactor Signed-off-by: liurenjie1024 <[email protected]> * Address comments * Resolve comments * Fix compiler * Fix build break * Fix build break * Fix build break * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>

* Balance the pre-merge CI job's time for the ci_1 and ci_2 tests To fix: #11825 The pre-merge CI job is divided into CI_1 (mvn_verify) and CI_2. We run these two parts in parallel to speed up the pre-merge CI. Currently, CI_1 takes about 2 hours, while CI_2 takes approximately 4 hours. Mark some tests as CI_1 to balance the time between CI_1 and CI_2 After remarking tests, both CI_1 and CI_2 jobs should be finished in 3 hours or so. Signed-off-by: timl <[email protected]> * Separate pre-merge CI job to two parts To balance the duration, separate pre-merge CI job to two parts: premergeUT1(2 shims' UT + 1/3 of the integration tests) premergeUT2(1 shim's UT + 2/3 of the integration tests), for balancing the duration Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>

Signed-off-by: Robert (Bobby) Evans <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>

Resolves #11651 This PR updates the Parser.parse() function to include a call to java.util.regex.Pattern.compile(pattern), ensuring that the pattern is validated upfront. If the pattern is not valid in Java, the same error should be raised for both CPU and GPU runs. This PR also modifies few Scala tests to align with the updated functionality. --------- Signed-off-by: Suraj Aralihalli <[email protected]>

Closes #11888. This PR implements a simple bounce buffer pool (implemented as a pool of buffers of equal size) which is used for the chunked packer as well as the host spill bounce buffer. It allows the number and size of buffers to be configured, and allows multiple threads to spill or read spill via the BBs concurrently.  --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>

Contributes to #10661, implements #11745 for DBR 14.3 - Add switch spark.rapids.shims.spark350db143.enabled  --------- Signed-off-by: Gera Shegalov <[email protected]>

…11880)  Closes #11830 Moves the IO outside of the critical section in the spillable buffer handle `spill` functions to allow threads interacting with the `SpillFramework` to manage the spill state of a handle consistently without being blocked on IO. Eg. if thread `t1` is in the middle of spilling, and thread `t2` wants to check whether this handle is currently spilling, it doesn't need to wait for the spill IO operation to complete in order to check whether the handle it spillable. --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>

…11991) This fixes #11989 except for the retry which I will file a follow on issue for. --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>

…U [databricks] (#12037) This PR fixes the piece of code in GpuTransitionOverrides where we decide if an Expression is on the GPU or not. We were only checking if an Expression returns true for `supportsColumnar` and assumed it was on the GPU which is not always the case. fixes #12036  --------- Signed-off-by: Raza Jafri <[email protected]>

@Partitioning

…#12000) Closes #11892 In the current code, a HybridParquetScan followed by a Filter will result in all conditions being pushed down to the CPU, but still remaining in the Filter at the same time, so the Filter conditions are evaluated twice. Usually the second evaluation is quite fast, so this won't be a big problem. But if there are some conditions that are not supported by CPU or GPU, it will cause some problems. This PR adds a rule to check each condition in filterExec before overriding: - If a filter condition is not supported by either CPU or GPU, it will fallback to CPU in FilterExec and not push down to CPU. - If a filter condition is only supported by the CPU, this pr pushes it down to the scan and removes it in FilterExec. - If a filter condition is only supported by the GPU, this pr keeps it in the filter. - If all conditions are pushed down to the scan, FilterExec is removed. The `supportedByHybridFilters` is from [velox-backend-support-progress](https://github.com/apache/incubator-gluten/blob/c8284e53f2eac5e02be79d5e3914a9b44a9f6f90/docs/velox-backend-support-progress.md) in Gluten. Here is the script to extract CPU supported exprs, [gist](https://gist.github.com/thirtiseven/5d9c3da285b41b2d1c409289fcc01b87). For example: ``` scala> val df = spark.read.parquet("parse_url_protocol") df: org.apache.spark.sql.DataFrame = [url: string, pr: string ... 2 more fields] scala> df.filter("startswith(pr, 'h') == False and ascii(url) >= 16").show() 25/01/23 16:32:31 WARN GpuOverrides: !Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it @Partitioning <SinglePartition$> could run on GPU *Exec <ProjectExec> will run on GPU *Expression <Alias> cast(count#2L as string) AS count#32 will run on GPU *Expression <Cast> cast(count#2L as string) will run on GPU *Exec <FilterExec> will run on GPU *Expression <Not> NOT StartsWith(pr#1, h) will run on GPU *Expression <StartsWith> StartsWith(pr#1, h) will run on GPU *Exec <FileSourceScanExec> will run on GPU ``` `startswith` is not supported in CPU so it will be kept in GPU, and `ascii` pushed down to CPU. The check is recursive. ![Screenshot 2025-01-23 at 16 38 42](https://github.com/user-attachments/assets/2732625e-b552-4c06-b663-00dafa0d1962) and ``` scala> df.filter("url >= 'http' and ascii(url) >= 16").show() 25/01/23 16:37:45 WARN GpuOverrides: !Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it @Partitioning <SinglePartition$> could run on GPU *Exec <ProjectExec> will run on GPU *Expression <Alias> cast(count#2L as string) AS count#66 will run on GPU *Expression <Cast> cast(count#2L as string) will run on GPU *Exec <FileSourceScanExec> will run on GPU ``` if all filters are supported, the FilterExec will be removed. ![Screenshot 2025-01-23 at 16 38 29](https://github.com/user-attachments/assets/b132b043-b96e-42d2-87f3-1fd61bef9e91)  --------- Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Alfred Xu <[email protected]>

…to the CPU in non-utc orc tests (#12050) After #12037 was merged, it did the right thing by ensuring that source scans are not allowed to fall back to the CPU unless we expect them to. Some non-UTC tests were falling back to the CPU without the test complaining, but now they are. This PR adds `FileSourceScanExec` and `BatchScanExec` to the allowNonGpu in `orc_test.py` and `orc_cast_test.py` when testing non-utc cases. fixes #12046  Signed-off-by: Raza Jafri <[email protected]>

Stop truncating failure messages in the test summary by default Fixes #12043  Signed-off-by: Gera Shegalov <[email protected]>

…s [databricks] (#11970) Addresses #11541 Table properties should be set unconditionally to accommodate diverging defaults in different Databricks versions Standardize table creation to be via SQL Improves ```bash env TEST_PARALLEL=0 \ TEST_MODE=DELTA_LAKE_ONLY TESTS=delta_lake_update_test.py \ PYSP_TEST_spark_rapids_sql_detectDeltaLogQueries=false \ PYSP_TEST_spark_rapids_sql_format_parquet_reader_type=PERFILE \ ./jenkins/databricks/test.sh ``` --------- Signed-off-by: Gera Shegalov <[email protected]>

Obtain the version of the rapids-hybrid-execution dependency JAR from the pom.xml file instead of using the version from the rapids plugin project. Signed-off-by: Tim Liu <[email protected]>

…cks] (#12060) This PR fixes #11433 . This PR makes the GPU Parquet reader more flexible when handling files whose decimal columns have a different precision/scale than Spark’s requested schema. Previously, the plugin’s code would fail early (“Parquet column cannot be converted”) if the file declared, for example, DECIMAL(20, 0) but Spark asked for DECIMAL(10, 0) or DECIMAL(5, 1). Now we defer these mismatches to be resolved with optional half-up rounding or overflow handling trying to match standard Spark behavior. In this PR, we make castDecimalToDecimal as public function. In `evolveSchemaCasts`, we pass the from and to DecimalTypes to cast the decimals to the required form. castDecimalToDecimal will handle the case of widening the scale/precisions or narrowing of scale/precisions. Updated the current integration tests. In Spark-UT we are disabling the test as the Apache Spark vectorized path throws error where as the spark-rapids implementation produces the correct results. --------- Signed-off-by: Niranjan Artal <[email protected]>

This script may run outside the project root path, so we use 'mvn -f $project_root' to target the project's pom.xml To follow up: #12054 Signed-off-by: timl <[email protected]>

This PR to stop building SPARK-4.0.0-SNAPSHOT in pre-merge and nightlies. A massive change went in Spark-4.0- apache/spark#49713 recently which breaks the builds and causes jenkins jobs failure. We stop building it temporarily until we fix the issue. --------- Signed-off-by: Niranjan Artal <[email protected]>

For some tests in CI/CD, we only need to download specific test files and do not need to clone the entire Git repository. As a result, we cannot retrieve the project root directory using Git commands. So we use the $WORKSPACE(always pointing to the root dir of the project) instead of the Git top-level path. To fix below error: ``` Downloading hybrid execution dependency jars... ++ git rev-parse --show-toplevel fatal: not a git repository (or any parent up to mount point /home/jenkins) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). + project_root= ++ mvn -f -B -q help:evaluate -Dexpression=spark-rapids-hybrid.version -DforceStdout POM file -B specified with the -f/--file command line argument does not exist ``` Signed-off-by: timl <[email protected]>

Fixes #12076 The PR is about to fix the multiple loading problem which is due to the duplicated URL paths for single extra plugin. --------- Signed-off-by: sperlingxx <[email protected]>

Closes #11985 Upgrades jucx libs from 1.16 to 1.18. Also upgrades shuffle example dockerfiles to 12.8 CUDA images, which support fabric memory handles and have support for grace-blackwell, which will be supported in UCX 1.18.1 Based off of #11147 Tested building docker images for ubuntu and rocky, running `ucx_perftest` in container, and running test queries with ucx shuffle enabled (on test clusters). --------- Signed-off-by: Zach Puller <[email protected]>

#12103) This reverts #4217 commit b18492e. #4217 was turning on event logging which was already on by default. #4217 overrode xdist-aware setting of event log locations, and causes the hang Fixes #12096 Downstream steps in pipelines need to collect event logs under all eventlog_gw[0-9] paths ``` $ find ./integration_tests/target/run_dir-20250211100159-C4VO -name \*zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw3/eventlog_v2_local-1739268134946/events_1_local-1739268134946.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw0/eventlog_v2_local-1739268134579/events_1_local-1739268134579.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw2/eventlog_v2_local-1739268134806/events_1_local-1739268134806.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw1/eventlog_v2_local-1739268134420/events_1_local-1739268134420.zstd ``` --------- Signed-off-by: Gera Shegalov <[email protected]>

Reverts #12058 The above PR was found to break ucx_perftest with -m cuda in the ubuntu rdma container in a CI test. This is caused specifically by the CUDA version upgrade in the PR. --------- Signed-off-by: Zach Puller <[email protected]>

Fix #12113 This fixes the issue by not using the HybridParquetScan when the schema is empty Signed-off-by: sperlingxx <[email protected]>

close #12091 This PR tries to fix the assertion error in the sized hash join by explicitly concatenating multiple batches into a single one. The original code expects only one batch in the input iterator for the build side of a small join, but it is not true when split-retry happens in the previous operator (e.g. GpuHashAggregate), and the GPU path in this sized join does not concatenate these small batches, instead passes them directly down to this `getSingleBuildBatch` function. Then the assertion fails. One more thing, in the failing case, the join has an exchange and an aggregate as the children.  --------- Signed-off-by: Firestarman <[email protected]>

Signed-off-by: nvauto <[email protected]>

yinqingh · 2025-02-13T07:54:01Z

Mark the change as drafted

This change is for in advance review,

I'll up merge the following changes if needed.

nvauto and others added 30 commits November 25, 2024 06:15

Init version 25.02.0-SNAPSHOT

5068921

Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: #11755 Signed-off-by: nvauto <[email protected]>

Merge pull request #11757 from NVIDIA/branch-24.12

8675a75

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11761 from NVIDIA/branch-24.12

c90361b

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11765 from NVIDIA/branch-24.12

7f904a7

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11767 from NVIDIA/branch-24.12

ef6cba8

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11769 from NVIDIA/branch-24.12

b208f9f

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11773 from NVIDIA/branch-24.12

d0d4590

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11774 from NVIDIA/branch-24.12

3564340

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11776 from NVIDIA/branch-24.12

2351dda

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11779 from NVIDIA/branch-24.12

9eb8047

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11780 from NVIDIA/branch-24.12

7a3f460

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11782 from NVIDIA/branch-24.12

63d8165

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Update rapids JNI and private dependency to 25.02.0-SNAPSHOT (#11772)

0bf85cb

To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>

Update advanced configs introduced by private repo (#11785)

5b77ed7

Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

Remove unnecessary toBeReturned field from serialized batch iterators (…

ca466e7

…#11778) Signed-off-by: Jason Lowe <[email protected]>

Merge pull request #11795 from NVIDIA/branch-24.12

7f91d37

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11806 from NVIDIA/branch-24.12

568a440

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request #11808 from NVIDIA/branch-24.12

017fdef

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

enable license header check & add header to files (#11786)

7927ae9

Signed-off-by: YanxuanLiu <[email protected]>

Deal with spark changes fro colum<->expression conversions (#11827)

d9bc056

Signed-off-by: Robert (Bobby) Evans <[email protected]>

Merge pull request #11836 from NVIDIA/branch-24.12

8ae6a68

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Some minor improvements identified during benchmark (#11829)

0fe162d

* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>

SurajAralihalli and others added 23 commits January 22, 2025 10:53

Grab the GPU Semaphore when reading cached batch data with the GPU (#…

0aaeed9

…11991) This fixes #11989 except for the retry which I will file a follow on issue for. --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>

Update the version of the rapids-hybrid-execution dependency. (#12054)

4c673a8

Obtain the version of the rapids-hybrid-execution dependency JAR from the pom.xml file instead of using the version from the rapids plugin project. Signed-off-by: Tim Liu <[email protected]>

Run mvn with the project's pom.xml in hybrid_execution.sh (#12064)

602b16a

This script may run outside the project root path, so we use 'mvn -f $project_root' to target the project's pom.xml To follow up: #12054 Signed-off-by: timl <[email protected]>

Fix the issue of ExtraPlugins loading multiple times (#12077)

0e7096f

Fixes #12076 The PR is about to fix the multiple loading problem which is due to the duplicated URL paths for single extra plugin. --------- Signed-off-by: sperlingxx <[email protected]>

Fix HybridParquetScan over select(1) (#12114)

cb1daeb

Fix #12113 This fixes the issue by not using the HybridParquetScan when the schema is empty Signed-off-by: sperlingxx <[email protected]>

Merge branch-25.02 into main

d3ad1af

Change version to 25.02.0

04c20c9

Signed-off-by: nvauto <[email protected]>

nvauto requested review from revans2, tgravescs, GaryShen2008, NvTimLiu and gerashegalov as code owners February 13, 2025 07:52

yinqingh marked this pull request as draft February 13, 2025 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge branch-25.02 into main #12128

Merge branch-25.02 into main #12128

nvauto commented Feb 13, 2025

yinqingh commented Feb 13, 2025

Merge branch-25.02 into main #12128

Are you sure you want to change the base?

Merge branch-25.02 into main #12128

Conversation

nvauto commented Feb 13, 2025

yinqingh commented Feb 13, 2025