-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of References for AVRO Schemas #926
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup commits to single/coherent commits. Add tests for better coverage.
@jjaakola-aiven Could you suggest the tests that should be added? |
f0f9450
to
0167a75
Compare
@jjaakola-aiven I'm still waiting for your suggestions regarding additional tests for the new functionality. Please keep in mind that we already have integration tests covering various aspects, including:
Additionally, the new code has successfully passed all existing tests related to Avro functionality. I also added a few unit tests for the AvroMerge class today. |
@libretto shall we rebase this work so we get upstream changes and then let's discuss around the comments if you can tell me when you'd like to, we can spare sometime and discuss in one thread here. |
9c26f46
to
0167a75
Compare
@libretto The unit test for |
9cac0ba
to
f4346ea
Compare
f4346ea
to
5ab6ba0
Compare
rebased |
@libretto Hi, sorry for the delay, this took some time to properly review. I think that the implementation using Avro unions is risky and might not work in all cases (hard to tell). I have opened a draft PR implementing the merging using Avro lib here: Please have a look and consider that approach instead. |
@davide-armand Hi, As I understand, the union approach concept has been confirmed by our teams and has passed all existing Karapace tests, including the references tests I provided. This demonstrates that the concept works well. I will try using Your approach with the Avro library to eliminate the need for the union method. I expect the results to be consistent. Regarding tests, I noticed that You added a few tests for recursion. However, the direct recursion test You included is on the Avro-level parser and should be tested within the Avro tests, outside of the Karapace project. Indirect recursion (where Schema A references Schema B, which in turn references Schema A) is not possible in Karapace's design, as all schemas are added simultaneously. A schema cannot be added if any of its references are unresolved, which effectively prevents indirect recursion between schemas. I will include the Your code for the test_avro_reference() test in the my branch tests. |
@davide-armand I utilized the Avro library functionality as you suggested. References now work in the native Avro way. |
#987 implemented there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@libretto Hi, I left a few comments, please have a look
src/karapace/schema_models.py
Outdated
self.schema_str = json_encode(json_decode(schema_str), compact=True, sort_keys=True) | ||
self.dependencies = dependencies | ||
self.unique_id = 0 | ||
self.regex = re.compile(r"^\s*\[") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex
does not seem to be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/karapace/schema_models.py
Outdated
|
||
return merge | ||
|
||
def resolve(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve()
and build()
can be merged in a single method, since builder()
is only called by resolve()
.
Maybe the whole AvroResolver
class can be just a static method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davide-armand After this change, we have a pylint issue but all tests are passed. Please check.
merged_schema = make_avsc_object(json.loads(schema), names) | ||
merged_schema_str = str(merged_schema) | ||
else: | ||
merged_schema_str = schema_str | ||
parsed_schema = parse_avro_schema_definition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have realized now that it is possible to pass a list of schema strings to avro.schema.parse()
:
https://github.com/aiven/avro/blob/5a82d57f2a650fd87c819a30e433f1abb2c76ca2/lang/py/avro/schema.py#L1192
We currently use parse()
to parse a single schema (see parse_avro_schema_definition()
).
When called with a list of schemas it returns a UnionSchema
(through make_avsc_object()
), which itself contains a list of parsed schemas (UnionSchema.schemas
).
The constructor of UnionSchema
seems to be doing what we want (loop on the schemas, call make_avsc_object()
and remember names
).
I think passing to parse()
the reference schemas + the main schema (the main schema should be passed as last in the list) and then getting the last schema from UnionSchema.schemas
should be equivalent to what's the PR now.
If it works then we can delegate that logic to the Avro library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
This change adds support for references for AVRO schemas
We aim to add support for references for AVRO schemas to Karapace using a straightforward approach by leveraging the same API used in Protobuf references. Although Avro does not natively support references, we can implement this functionality by utilizing the property of Avro schemas that allows combining types. We utilize Names from avro.name and make_avsc_object to create an Avro schema that includes referenced schemas within as explained in #987.