-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-25.02 into main #12128
Draft
nvauto
wants to merge
119
commits into
main
Choose a base branch
from
merge-branch-25.02-to-main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Merge branch-25.02 into main #12128
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: #11755 Signed-off-by: nvauto <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
* remove excluded release shim and TODO Signed-off-by: YanxuanLiu <[email protected]> * remove shim from 2.13 properties Signed-off-by: YanxuanLiu <[email protected]> * Fix error: 'NoneType' object has no attribute 'split' for excluded_shims Signed-off-by: timl <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>
Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
…#11778) Signed-off-by: Jason Lowe <[email protected]>
#11788) The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Signed-off-by: timl <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
…ip ci] (#11791) * replace date with jni&private timestamp for cache key Signed-off-by: YanxuanLiu <[email protected]> * use date if quering timestamp failed Signed-off-by: YanxuanLiu <[email protected]> * add bash script to get timestamp Signed-off-by: YanxuanLiu <[email protected]> * replace timestamp with sha1 Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
Signed-off-by: YanxuanLiu <[email protected]>
* Add the 'test_type' parameter for Databricks script For fixing: #11818 'nightly' is for nightly CI, 'pre-commit' is for the pre-merge CI the pre-merge CI does not need to copy the Rapids plugin built tar from the Databricks cluster back to the local host, only the nightly build needs to copy the spark-rapids-built.tgz back Signed-off-by: timl <[email protected]> * Update copyright Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
…ace (#11813) * Support some escape characters in search list when rewriting regexp_replace to string replace Signed-off-by: Haoyang Li <[email protected]> * add a case Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> * update datagen Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
* Fix TrafficController numTasks check Signed-off-by: Jihoon Son <[email protected]> * rename weights properly * simplify the loop condition * Rename the condition variable for readability Co-authored-by: Gera Shegalov <[email protected]> * missing renames * add test for when all tasks are big --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
* Add support for kudo write metrics * Refactor Signed-off-by: liurenjie1024 <[email protected]> * Address comments * Resolve comments * Fix compiler * Fix build break * Fix build break * Fix build break * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>
* Balance the pre-merge CI job's time for the ci_1 and ci_2 tests To fix: #11825 The pre-merge CI job is divided into CI_1 (mvn_verify) and CI_2. We run these two parts in parallel to speed up the pre-merge CI. Currently, CI_1 takes about 2 hours, while CI_2 takes approximately 4 hours. Mark some tests as CI_1 to balance the time between CI_1 and CI_2 After remarking tests, both CI_1 and CI_2 jobs should be finished in 3 hours or so. Signed-off-by: timl <[email protected]> * Separate pre-merge CI job to two parts To balance the duration, separate pre-merge CI job to two parts: premergeUT1(2 shims' UT + 1/3 of the integration tests) premergeUT2(1 shim's UT + 2/3 of the integration tests), for balancing the duration Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>
Resolves #11651 This PR updates the Parser.parse() function to include a call to java.util.regex.Pattern.compile(pattern), ensuring that the pattern is validated upfront. If the pattern is not valid in Java, the same error should be raised for both CPU and GPU runs. This PR also modifies few Scala tests to align with the updated functionality. --------- Signed-off-by: Suraj Aralihalli <[email protected]>
Closes #11888. This PR implements a simple bounce buffer pool (implemented as a pool of buffers of equal size) which is used for the chunked packer as well as the host spill bounce buffer. It allows the number and size of buffers to be configured, and allows multiple threads to spill or read spill via the BBs concurrently. <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>
Contributes to #10661, implements #11745 for DBR 14.3 - Add switch spark.rapids.shims.spark350db143.enabled <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> --------- Signed-off-by: Gera Shegalov <[email protected]>
…11880) <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Closes #11830 Moves the IO outside of the critical section in the spillable buffer handle `spill` functions to allow threads interacting with the `SpillFramework` to manage the spill state of a handle consistently without being blocked on IO. Eg. if thread `t1` is in the middle of spilling, and thread `t2` wants to check whether this handle is currently spilling, it doesn't need to wait for the spill IO operation to complete in order to check whether the handle it spillable. --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>
…11991) This fixes #11989 except for the retry which I will file a follow on issue for. --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>
…U [databricks] (#12037) This PR fixes the piece of code in GpuTransitionOverrides where we decide if an Expression is on the GPU or not. We were only checking if an Expression returns true for `supportsColumnar` and assumed it was on the GPU which is not always the case. fixes #12036 <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> --------- Signed-off-by: Raza Jafri <[email protected]>
…#12000) Closes #11892 In the current code, a HybridParquetScan followed by a Filter will result in all conditions being pushed down to the CPU, but still remaining in the Filter at the same time, so the Filter conditions are evaluated twice. Usually the second evaluation is quite fast, so this won't be a big problem. But if there are some conditions that are not supported by CPU or GPU, it will cause some problems. This PR adds a rule to check each condition in filterExec before overriding: - If a filter condition is not supported by either CPU or GPU, it will fallback to CPU in FilterExec and not push down to CPU. - If a filter condition is only supported by the CPU, this pr pushes it down to the scan and removes it in FilterExec. - If a filter condition is only supported by the GPU, this pr keeps it in the filter. - If all conditions are pushed down to the scan, FilterExec is removed. The `supportedByHybridFilters` is from [velox-backend-support-progress](https://github.com/apache/incubator-gluten/blob/c8284e53f2eac5e02be79d5e3914a9b44a9f6f90/docs/velox-backend-support-progress.md) in Gluten. Here is the script to extract CPU supported exprs, [gist](https://gist.github.com/thirtiseven/5d9c3da285b41b2d1c409289fcc01b87). For example: ``` scala> val df = spark.read.parquet("parse_url_protocol") df: org.apache.spark.sql.DataFrame = [url: string, pr: string ... 2 more fields] scala> df.filter("startswith(pr, 'h') == False and ascii(url) >= 16").show() 25/01/23 16:32:31 WARN GpuOverrides: !Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it @Partitioning <SinglePartition$> could run on GPU *Exec <ProjectExec> will run on GPU *Expression <Alias> cast(count#2L as string) AS count#32 will run on GPU *Expression <Cast> cast(count#2L as string) will run on GPU *Exec <FilterExec> will run on GPU *Expression <Not> NOT StartsWith(pr#1, h) will run on GPU *Expression <StartsWith> StartsWith(pr#1, h) will run on GPU *Exec <FileSourceScanExec> will run on GPU ``` `startswith` is not supported in CPU so it will be kept in GPU, and `ascii` pushed down to CPU. The check is recursive. ![Screenshot 2025-01-23 at 16 38 42](https://github.com/user-attachments/assets/2732625e-b552-4c06-b663-00dafa0d1962) and ``` scala> df.filter("url >= 'http' and ascii(url) >= 16").show() 25/01/23 16:37:45 WARN GpuOverrides: !Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it @Partitioning <SinglePartition$> could run on GPU *Exec <ProjectExec> will run on GPU *Expression <Alias> cast(count#2L as string) AS count#66 will run on GPU *Expression <Cast> cast(count#2L as string) will run on GPU *Exec <FileSourceScanExec> will run on GPU ``` if all filters are supported, the FilterExec will be removed. ![Screenshot 2025-01-23 at 16 38 29](https://github.com/user-attachments/assets/b132b043-b96e-42d2-87f3-1fd61bef9e91) <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> --------- Signed-off-by: Haoyang Li <[email protected]> Co-authored-by: Alfred Xu <[email protected]>
…to the CPU in non-utc orc tests (#12050) After #12037 was merged, it did the right thing by ensuring that source scans are not allowed to fall back to the CPU unless we expect them to. Some non-UTC tests were falling back to the CPU without the test complaining, but now they are. This PR adds `FileSourceScanExec` and `BatchScanExec` to the allowNonGpu in `orc_test.py` and `orc_cast_test.py` when testing non-utc cases. fixes #12046 <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Signed-off-by: Raza Jafri <[email protected]>
Stop truncating failure messages in the test summary by default Fixes #12043 <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Signed-off-by: Gera Shegalov <[email protected]>
…s [databricks] (#11970) Addresses #11541 Table properties should be set unconditionally to accommodate diverging defaults in different Databricks versions Standardize table creation to be via SQL Improves ```bash env TEST_PARALLEL=0 \ TEST_MODE=DELTA_LAKE_ONLY TESTS=delta_lake_update_test.py \ PYSP_TEST_spark_rapids_sql_detectDeltaLogQueries=false \ PYSP_TEST_spark_rapids_sql_format_parquet_reader_type=PERFILE \ ./jenkins/databricks/test.sh ``` --------- Signed-off-by: Gera Shegalov <[email protected]>
Obtain the version of the rapids-hybrid-execution dependency JAR from the pom.xml file instead of using the version from the rapids plugin project. Signed-off-by: Tim Liu <[email protected]>
…cks] (#12060) This PR fixes #11433 . This PR makes the GPU Parquet reader more flexible when handling files whose decimal columns have a different precision/scale than Spark’s requested schema. Previously, the plugin’s code would fail early (“Parquet column cannot be converted”) if the file declared, for example, DECIMAL(20, 0) but Spark asked for DECIMAL(10, 0) or DECIMAL(5, 1). Now we defer these mismatches to be resolved with optional half-up rounding or overflow handling trying to match standard Spark behavior. In this PR, we make castDecimalToDecimal as public function. In `evolveSchemaCasts`, we pass the from and to DecimalTypes to cast the decimals to the required form. castDecimalToDecimal will handle the case of widening the scale/precisions or narrowing of scale/precisions. Updated the current integration tests. In Spark-UT we are disabling the test as the Apache Spark vectorized path throws error where as the spark-rapids implementation produces the correct results. --------- Signed-off-by: Niranjan Artal <[email protected]>
This script may run outside the project root path, so we use 'mvn -f $project_root' to target the project's pom.xml To follow up: #12054 Signed-off-by: timl <[email protected]>
This PR to stop building SPARK-4.0.0-SNAPSHOT in pre-merge and nightlies. A massive change went in Spark-4.0- apache/spark#49713 recently which breaks the builds and causes jenkins jobs failure. We stop building it temporarily until we fix the issue. --------- Signed-off-by: Niranjan Artal <[email protected]>
For some tests in CI/CD, we only need to download specific test files and do not need to clone the entire Git repository. As a result, we cannot retrieve the project root directory using Git commands. So we use the $WORKSPACE(always pointing to the root dir of the project) instead of the Git top-level path. To fix below error: ``` Downloading hybrid execution dependency jars... ++ git rev-parse --show-toplevel fatal: not a git repository (or any parent up to mount point /home/jenkins) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). + project_root= ++ mvn -f -B -q help:evaluate -Dexpression=spark-rapids-hybrid.version -DforceStdout POM file -B specified with the -f/--file command line argument does not exist ``` Signed-off-by: timl <[email protected]>
Fixes #12076 The PR is about to fix the multiple loading problem which is due to the duplicated URL paths for single extra plugin. --------- Signed-off-by: sperlingxx <[email protected]>
<!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Closes #11985 Upgrades jucx libs from 1.16 to 1.18. Also upgrades shuffle example dockerfiles to 12.8 CUDA images, which support fabric memory handles and have support for grace-blackwell, which will be supported in UCX 1.18.1 Based off of #11147 Tested building docker images for ubuntu and rocky, running `ucx_perftest` in container, and running test queries with ucx shuffle enabled (on test clusters). --------- Signed-off-by: Zach Puller <[email protected]>
#12103) This reverts #4217 commit b18492e. #4217 was turning on event logging which was already on by default. #4217 overrode xdist-aware setting of event log locations, and causes the hang Fixes #12096 Downstream steps in pipelines need to collect event logs under all eventlog_gw[0-9] paths ``` $ find ./integration_tests/target/run_dir-20250211100159-C4VO -name \*zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw3/eventlog_v2_local-1739268134946/events_1_local-1739268134946.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw0/eventlog_v2_local-1739268134579/events_1_local-1739268134579.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw2/eventlog_v2_local-1739268134806/events_1_local-1739268134806.zstd ./integration_tests/target/run_dir-20250211100159-C4VO/eventlog_gw1/eventlog_v2_local-1739268134420/events_1_local-1739268134420.zstd ``` --------- Signed-off-by: Gera Shegalov <[email protected]>
<!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> Reverts #12058 The above PR was found to break ucx_perftest with -m cuda in the ubuntu rdma container in a CI test. This is caused specifically by the CUDA version upgrade in the PR. --------- Signed-off-by: Zach Puller <[email protected]>
Fix #12113 This fixes the issue by not using the HybridParquetScan when the schema is empty Signed-off-by: sperlingxx <[email protected]>
close #12091 This PR tries to fix the assertion error in the sized hash join by explicitly concatenating multiple batches into a single one. The original code expects only one batch in the input iterator for the build side of a small join, but it is not true when split-retry happens in the previous operator (e.g. GpuHashAggregate), and the GPU path in this sized join does not concatenate these small batches, instead passes them directly down to this `getSingleBuildBatch` function. Then the assertion fails. One more thing, in the failing case, the join has an exchange and an aggregate as the children. <!-- Thank you for contributing to RAPIDS Accelerator for Apache Spark! Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present). 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please avoid rebasing your branch during the review process, as this causes the context of any comments made by reviewers to be lost. If conflicts occur during review then they should be resolved by merging into the branch used for making the pull request. Many thanks in advance for your cooperation! --> --------- Signed-off-by: Firestarman <[email protected]>
Signed-off-by: nvauto <[email protected]>
Mark the change as drafted This change is for in advance review, I'll up merge the following changes if needed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to 25.02.0
Note: merge this PR with Create a merge commit to merge