forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-41813: [C++] Fix avx2 gather offset larger than 2GB in `Comp…
…areColumnsToRows` (apache#42188) ### Rationale for this change AVX2 intrinsics `_mm256_i32gather_epi32`/`_mm256_i32gather_epi64` are used in `CompareColumnsToRows` API, and treat the `vindex` as signed integer. In our row table implementation, we use `uint32_t` to represent the offset within the row table. When a offset is larger than (`0x80000000`, or `2GB`), the aforementioned intrinsics will treat it as negative offset and gather the data from undesired address. More details please see apache#41813 (comment). Considering there is no unsigned-32bit-offset or 64bit-offset counterparts of those intrinsics in AVX2, this issue can be simply mitigated by translating the base address and the offset: ``` new_base = base + 0x80000000; new_offset = offset - 0x80000000; ``` ### What changes are included in this PR? Fix and UT that reproduces the issue. ### Are these changes tested? UT included. ### Are there any user-facing changes? None. * GitHub Issue: apache#41813 Authored-by: Ruoxi Sun <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
- Loading branch information
1 parent
3e7ae53
commit e635cc2
Showing
2 changed files
with
179 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters