Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Avro GenericRecord to Row conversion does not support some logical types and any custom conversions #34009

Open
1 of 17 tasks
wollowizard opened this issue Feb 18, 2025 · 1 comment
Assignees

Comments

@wollowizard
Copy link

What happened?

In order to convert an avro GenericRecord to a Row, the org.apache.beam.sdk.extensions.avro.schemas.utils.AvroUtils#toBeamRowStrict(org.apache.avro.generic.GenericRecord, org.apache.beam.sdk.schemas.Schema) method can be used. This delegates to convertAvroFieldStrict that has hardcoded the resolution of a few avro logical types (LogicalTypes.Decimal, LogicalTypes.TimestampMillis, LogicalTypes.Date) and supports only a few hardcoded source types.
However this does not work for many cases where the generic record contains fields of other types. For example, the Avro classes created with the avro-maven-plugin use the specific types (i.e. LogicalTypes.Decimal is directly stored in memory as a BigDecimal, not as a ByteBuffer).
Avro has GenericData as a way to carry logical type conversions, so the idea here is that we can actually:

  1. support all logical types defined by avro
  2. support generic records containing any custom type, as long as an appropriate conversion is also given (via GenericData) to convert this to a primitive type. this will in turn also guarantee that SpecificRecords (SpecificRecordBase subclasses GenericRecord) are supported

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@wollowizard
Copy link
Author

.take-issue

wollowizard pushed a commit to wollowizard/beam that referenced this issue Feb 19, 2025
wollowizard added a commit to wollowizard/beam that referenced this issue Feb 19, 2025
wollowizard added a commit to wollowizard/beam that referenced this issue Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant