Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36760: [Go] Add Avro OCF reader #37115

Merged

Conversation

loicalleyne
Copy link
Contributor

@loicalleyne loicalleyne commented Aug 10, 2023

Rationale for this change

What changes are included in this PR?

Avro reader

Are these changes tested?

Local integration tests, no unit tests yet.

Are there any user-facing changes?

New Avro reader API

@kou kou changed the title GH-36760:Go-adding avro ocf reader- reader component GH-36760: [Go] Add Avro OCF reader Aug 10, 2023
@loicalleyne loicalleyne force-pushed the GH-36760-Go]-Adding-avro-ocf-reader---reader branch from 9eb84ad to cda1e8f Compare September 19, 2023 20:27
@loicalleyne loicalleyne reopened this Sep 19, 2023
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, bunch of comments

go/arrow/avro/schema.go Outdated Show resolved Hide resolved
go/arrow/avro/schema.go Outdated Show resolved Hide resolved
go/arrow/avro/schema.go Outdated Show resolved Hide resolved
go/arrow/avro/schema.go Outdated Show resolved Hide resolved
go/arrow/avro/schema.go Show resolved Hide resolved
go/arrow/avro/reader_types.go Outdated Show resolved Hide resolved
go/arrow/avro/reader_types.go Outdated Show resolved Hide resolved
go/arrow/avro/reader_types.go Outdated Show resolved Hide resolved
go/arrow/avro/reader_types.go Outdated Show resolved Hide resolved
go/arrow/avro/reader_types.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Oct 6, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Oct 9, 2023
@loicalleyne loicalleyne closed this Nov 2, 2023
@loicalleyne loicalleyne force-pushed the GH-36760-Go]-Adding-avro-ocf-reader---reader branch from 060b375 to d8aeb42 Compare November 2, 2023 15:10
@loicalleyne loicalleyne reopened this Nov 2, 2023
@loicalleyne
Copy link
Contributor Author

I've resynced my branch and removed the memfile caching in the example avro2parquet program to avoid adding a non-essential package import.
Are there any other issues that need to be addressed?

@zeroshade
Copy link
Member

@loicalleyne looks like you need to rebase and update the go.mod (run go mod tidy?)

jonkeane and others added 6 commits November 13, 2023 17:14
…#38083)

### Rationale for this change

A follow on to apache#38052 which introduced a small bug

### What changes are included in this PR?

Don't ignore the stdout which is the mechanism to tell if under emulation.

### Are these changes tested?

I tested these locally:

On an M2 with emulation:

```
> identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1")
[1] FALSE
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)
character(0)
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)
[1] "1"
> identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1")
[1] TRUE
```

On an M2 without emulation:

```
> identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1")
[1] FALSE
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)
character(0)
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)
[1] "0"
> identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1")
[1] FALSE
```

On an x86 (note the warning message suppression is what apache#38052 resolved):

```
>  identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)), "1")
[1] FALSE
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE, ignore.stdout = TRUE)
character(0)
attr(,"status")
[1] 1
Warning message:
In system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE,  :
  running command 'sysctl -n sysctl.proc_translated >/dev/null 2>/dev/null' had status 1
> system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)
character(0)
attr(,"status")
[1] 1
Warning message:
In system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE) :
  running command 'sysctl -n sysctl.proc_translated 2>/dev/null' had status 1
> identical(suppressWarnings(system("sysctl -n sysctl.proc_translated", intern = TRUE, ignore.stderr = TRUE)), "1")
[1] FALSE
```

### Are there any user-facing changes?

No / the functionality is restored

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
…ctionalPart is too large (apache#37731)

### Rationale for this change

In BigQuery, the BIGNUMERIC supports very large precision and scale values, which causes an error in the DecimalUtility due to the `fractionalPart` of the value being larger than .NET's maximum decimal value. 

### What changes are included in this PR?

1. Throws an OverflowException with the entire original value
2. Changes the original OverflowException to also include the original value (which can be parsed as a string by downstream drivers). 

### Are these changes tested?

Yes, tests are added for this scenario.

### Are there any user-facing changes?

* Closes: apache#37730

Lead-authored-by: David Coe <[email protected]>
Co-authored-by: davidhcoe <[email protected]>
Signed-off-by: David Li <[email protected]>
…le removal (apache#38085)

### Rationale for this change

We have failing CI on Windows because removing files that have memory mapped sections is not allowed.

### What changes are included in this PR?

Pass `mmap = FALSE` when we explicitly check for removal. I think the other cases are OK because `unlink()` fails silently (which is maybe why we haven't seen mass CI failures because of this issue before).

### Are these changes tested?

The changes are covered by existing tests.

### Are there any user-facing changes?

No.

* Closes: apache#38084

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
### Rationale for this change

Spark released 3.5.0 and dropped support for Java 8 and 11 on main.

### What changes are included in this PR?

* Upgrade Spark integration tests from v3.4.1 -> 3.5.0
* Use JDK 17 for Spark latest

### Are these changes tested?

Will test in CI

### Are there any user-facing changes?

No
* Closes: apache#38060

Authored-by: Dane Pitkin <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
schema : added field nullability
loader: moved close chan to defer
reader_types: a plethora of deduplications
new buildArrowField() to reduce code duplication in schema conversion

removed unnecessary type assertions and unused methods in reader_types.go

simplified getValue() to remove switch statement
@loicalleyne
Copy link
Contributor Author

I did a branch sync and rebase, noticed there's been a version bump and so updated to v15 - then ran go mod tidy.
I think something about the sync made github actions think that the PR should be tagged for all the languages (sorry!).

@trxcllnt trxcllnt removed their request for review November 14, 2023 06:10
@loicalleyne
Copy link
Contributor Author

Static check was failing because error variable name did not conform to ErrFoo.
Renamed NullStructData to ErrNullStructData.

@loicalleyne
Copy link
Contributor Author

Now the builds are failing because of the bug in SchemaEqual(). Can I remove that test until we figure out what's wrong with that function? The test that compares the schema string representation is passing.

@zeroshade
Copy link
Member

@loicalleyne I'd say sure you can remove that test until we figure out the issue there as I'd wager it's not related to this change at all and should be addressed separately. Could you try creating a reproducer of that failure separate from these changes and file a separate issue?

removed TestSchemaEqual() - suspected bug in arrow.Schema.Equal()
@zeroshade
Copy link
Member

Gotta fix for 32-bit builds:
image

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Nov 21, 2023
@wgtmac wgtmac removed their request for review November 22, 2023 01:34
@zeroshade zeroshade merged commit 5ab60ea into apache:main Nov 27, 2023
24 checks passed
@zeroshade zeroshade removed the awaiting merge Awaiting merge label Nov 27, 2023
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 5ab60ea.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change

### What changes are included in this PR?
Avro reader 

### Are these changes tested?
Local integration tests, no unit tests yet.

### Are there any user-facing changes?
New Avro reader API

* Closes: apache#36760

Lead-authored-by: Loïc Alleyne <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: sgilmore10 <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: James Duong <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Dane Pitkin <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Co-authored-by: Curt Hagenlocher <[email protected]>
Co-authored-by: Jacob Wujciak-Jens <[email protected]>
Co-authored-by: Hyunseok Seo <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: James Duong <[email protected]>
Co-authored-by: Nic Crane <[email protected]>
Co-authored-by: Ivan Chesnov <[email protected]>
Co-authored-by: Diego Fernández Giraldo <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Co-authored-by: Peter Andreas Entschev <[email protected]>
Co-authored-by: abandy <[email protected]>
Co-authored-by: Lei Hou <[email protected]>
Co-authored-by: Yue <[email protected]>
Co-authored-by: davidhcoe <[email protected]>
Co-authored-by: Tsutomu Katsube <[email protected]>
Co-authored-by: Divyansh200102 <[email protected]>
Co-authored-by: Thomas Newton <[email protected]>
Co-authored-by: Jeremy Aguilon <[email protected]>
Co-authored-by: prmoore77 <[email protected]>
Co-authored-by: Jiaxing Liang <[email protected]>
Co-authored-by: orgadish <[email protected]>
Co-authored-by: Maximilian Muecke <[email protected]>
Co-authored-by: Gavin Murrison <[email protected]>
Co-authored-by: William Ayd <[email protected]>
Co-authored-by: Laurent Goujon <[email protected]>
Co-authored-by: Jin Shang <[email protected]>
Co-authored-by: Alexander Grueneberg <[email protected]>
Co-authored-by: Paul Spangler <[email protected]>
Co-authored-by: Gang Wu <[email protected]>
Co-authored-by: Michael Lui <[email protected]>
Co-authored-by: patrick <[email protected]>
Co-authored-by: Dan Homola <[email protected]>
Co-authored-by: Ben Harkins <[email protected]>
Co-authored-by: Nick Hughes <[email protected]>
Co-authored-by: Fernando Mayer <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Co-authored-by: Francis <[email protected]>
Co-authored-by: Donald Tolley <[email protected]>
Co-authored-by: Judah Rand <[email protected]>
Co-authored-by: Eero Lihavainen <[email protected]>
Co-authored-by: Benjamin Schmidt <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Pierre Moulon <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: Bryan Cutler <[email protected]>
Co-authored-by: Diogo Teles Sant'Anna <[email protected]>
Co-authored-by: Fatemah Panahi <[email protected]>
Co-authored-by: Junming Chen <[email protected]>
Co-authored-by: Chris Larsen <[email protected]>
Co-authored-by: Dan Stone <[email protected]>
Co-authored-by: Tim Schaub <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Co-authored-by: jeremy <[email protected]>
Co-authored-by: Dan Homola <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: anjakefala <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Jeremy Aguilon <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: scoder <[email protected]>
Co-authored-by: voidstar69 <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Ivan Chesnov <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: David Li <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Go] Adding Avro OCF reader