-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36760: [Go] Add Avro OCF reader #37115
GH-36760: [Go] Add Avro OCF reader #37115
Conversation
9eb84ad
to
cda1e8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, bunch of comments
060b375
to
d8aeb42
Compare
I've resynced my branch and removed the memfile caching in the example avro2parquet program to avoid adding a non-essential package import. |
@loicalleyne looks like you need to rebase and update the go.mod (run |
…#38083) ### Rationale for this change A follow on to apache#38052 which introduced a small bug ### What changes are included in this PR? Don't ignore the stdout which is the mechanism to tell if under emulation. ### Are these changes tested? I tested these locally: On an M2 with emulation: ``` > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1") [1] FALSE > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE) character(0) > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE) [1] "1" > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1") [1] TRUE ``` On an M2 without emulation: ``` > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1") [1] FALSE > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE) character(0) > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE) [1] "0" > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1") [1] FALSE ``` On an x86 (note the warning message suppression is what apache#38052 resolved): ``` > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1") [1] FALSE > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE) character(0) attr(,"status") [1] 1 Warning message: In system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, : running command 'sysctl -n sysctl.proc_translated >/dev/null 2>/dev/null' had status 1 > system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE) character(0) attr(,"status") [1] 1 Warning message: In system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE) : running command 'sysctl -n sysctl.proc_translated 2>/dev/null' had status 1 > identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1") [1] FALSE ``` ### Are there any user-facing changes? No / the functionality is restored Authored-by: Jonathan Keane <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
…ctionalPart is too large (apache#37731) ### Rationale for this change In BigQuery, the BIGNUMERIC supports very large precision and scale values, which causes an error in the DecimalUtility due to the `fractionalPart` of the value being larger than .NET's maximum decimal value. ### What changes are included in this PR? 1. Throws an OverflowException with the entire original value 2. Changes the original OverflowException to also include the original value (which can be parsed as a string by downstream drivers). ### Are these changes tested? Yes, tests are added for this scenario. ### Are there any user-facing changes? * Closes: apache#37730 Lead-authored-by: David Coe <[email protected]> Co-authored-by: davidhcoe <[email protected]> Signed-off-by: David Li <[email protected]>
…le removal (apache#38085) ### Rationale for this change We have failing CI on Windows because removing files that have memory mapped sections is not allowed. ### What changes are included in this PR? Pass `mmap = FALSE` when we explicitly check for removal. I think the other cases are OK because `unlink()` fails silently (which is maybe why we haven't seen mass CI failures because of this issue before). ### Are these changes tested? The changes are covered by existing tests. ### Are there any user-facing changes? No. * Closes: apache#38084 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
### Rationale for this change Spark released 3.5.0 and dropped support for Java 8 and 11 on main. ### What changes are included in this PR? * Upgrade Spark integration tests from v3.4.1 -> 3.5.0 * Use JDK 17 for Spark latest ### Are these changes tested? Will test in CI ### Are there any user-facing changes? No * Closes: apache#38060 Authored-by: Dane Pitkin <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
schema : added field nullability loader: moved close chan to defer reader_types: a plethora of deduplications
new buildArrowField() to reduce code duplication in schema conversion removed unnecessary type assertions and unused methods in reader_types.go simplified getValue() to remove switch statement
I did a branch sync and rebase, noticed there's been a version bump and so updated to v15 - then ran go mod tidy. |
Static check was failing because error variable name did not conform to ErrFoo. |
Now the builds are failing because of the bug in SchemaEqual(). Can I remove that test until we figure out what's wrong with that function? The test that compares the schema string representation is passing. |
@loicalleyne I'd say sure you can remove that test until we figure out the issue there as I'd wager it's not related to this change at all and should be addressed separately. Could you try creating a reproducer of that failure separate from these changes and file a separate issue? |
removed TestSchemaEqual() - suspected bug in arrow.Schema.Equal()
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5ab60ea. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change ### What changes are included in this PR? Avro reader ### Are these changes tested? Local integration tests, no unit tests yet. ### Are there any user-facing changes? New Avro reader API * Closes: apache#36760 Lead-authored-by: Loïc Alleyne <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Raúl Cumplido <[email protected]> Co-authored-by: sgilmore10 <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: James Duong <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: Dane Pitkin <[email protected]> Co-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Matt Topol <[email protected]> Co-authored-by: Curt Hagenlocher <[email protected]> Co-authored-by: Jacob Wujciak-Jens <[email protected]> Co-authored-by: Hyunseok Seo <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: James Duong <[email protected]> Co-authored-by: Nic Crane <[email protected]> Co-authored-by: Ivan Chesnov <[email protected]> Co-authored-by: Diego Fernández Giraldo <[email protected]> Co-authored-by: Bryce Mecum <[email protected]> Co-authored-by: Jonathan Keane <[email protected]> Co-authored-by: Peter Andreas Entschev <[email protected]> Co-authored-by: abandy <[email protected]> Co-authored-by: Lei Hou <[email protected]> Co-authored-by: Yue <[email protected]> Co-authored-by: davidhcoe <[email protected]> Co-authored-by: Tsutomu Katsube <[email protected]> Co-authored-by: Divyansh200102 <[email protected]> Co-authored-by: Thomas Newton <[email protected]> Co-authored-by: Jeremy Aguilon <[email protected]> Co-authored-by: prmoore77 <[email protected]> Co-authored-by: Jiaxing Liang <[email protected]> Co-authored-by: orgadish <[email protected]> Co-authored-by: Maximilian Muecke <[email protected]> Co-authored-by: Gavin Murrison <[email protected]> Co-authored-by: William Ayd <[email protected]> Co-authored-by: Laurent Goujon <[email protected]> Co-authored-by: Jin Shang <[email protected]> Co-authored-by: Alexander Grueneberg <[email protected]> Co-authored-by: Paul Spangler <[email protected]> Co-authored-by: Gang Wu <[email protected]> Co-authored-by: Michael Lui <[email protected]> Co-authored-by: patrick <[email protected]> Co-authored-by: Dan Homola <[email protected]> Co-authored-by: Ben Harkins <[email protected]> Co-authored-by: Nick Hughes <[email protected]> Co-authored-by: Fernando Mayer <[email protected]> Co-authored-by: Rok Mihevc <[email protected]> Co-authored-by: Francis <[email protected]> Co-authored-by: Donald Tolley <[email protected]> Co-authored-by: Judah Rand <[email protected]> Co-authored-by: Eero Lihavainen <[email protected]> Co-authored-by: Benjamin Schmidt <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Pierre Moulon <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Bryan Cutler <[email protected]> Co-authored-by: Diogo Teles Sant'Anna <[email protected]> Co-authored-by: Fatemah Panahi <[email protected]> Co-authored-by: Junming Chen <[email protected]> Co-authored-by: Chris Larsen <[email protected]> Co-authored-by: Dan Stone <[email protected]> Co-authored-by: Tim Schaub <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Co-authored-by: jeremy <[email protected]> Co-authored-by: Dan Homola <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: anjakefala <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Jeremy Aguilon <[email protected]> Co-authored-by: Weston Pace <[email protected]> Co-authored-by: scoder <[email protected]> Co-authored-by: voidstar69 <[email protected]> Co-authored-by: Dewey Dunnington <[email protected]> Co-authored-by: Ivan Chesnov <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: David Li <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Rationale for this change
What changes are included in this PR?
Avro reader
Are these changes tested?
Local integration tests, no unit tests yet.
Are there any user-facing changes?
New Avro reader API