Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules don't work for protobuf schemas added with confluent client #5790

Open
wolfchimneyrock opened this issue Jan 6, 2025 · 1 comment
Open

Comments

@wolfchimneyrock
Copy link
Contributor

Description

Registry Version: 3.0.6
Persistence type: postgresql

Environment

client library: confluent-kafka-python v2.4.0
protobuf code generated from schema:

syntax = "proto3";

package protobuf.examples;

enum Action {
    UNSPECIFIED = 0;
    BUY = 1;
    SELL = 2;
}

message OrderEvent {
    string ticker = 1;
    string company = 2;
    Action action = 3;
    double price = 4;
    uint64 quantity = 5;
}

the confluent-kafka-python schema registry client using auto-registration will send the contents of a protobuf schemas as a base64 encoded protobuf messaged generated with this 'descriptor' schema:
https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto

When there are no validity rules defined, then the schema can successfully be registered.

however, when a global or group rule is enabled, then it fails with a 422 exception:
confluent_kafka.schema_registry.error.SchemaRegistryError: io.apicurio.registry.rules.RuleViolationException: Syntax violation for Protobuf artifact. (HTTP status code 422, SR code 42201)

I think the issue is that Apicurio is expecting these protobuf schemas to be in the text .proto format, and the rules fail to parse the binary FileDescriptor payload provided by the python confluent client.

It seems to me the binary file descriptor is the better content to use for protobuf schemas?

  1. protoc compile generated classes include the serialized file descriptor so its easier for client code to transmit
  2. it strips out comments, orders fields, so it is more "canonical" by default
  3. syntax is guaranteed valid if the protobuf payload can parse

I understand wanting to keep the apicurio API as-is, i.e. accepting the text .proto files. however, the confluent compatible api should accept the binary file descriptor payload, and rules should evaluate the payload whether it came from the text .proto or the binary file descriptor. also the confluent compatible api should return the binary encoded format, even if it was stored as the .proto text file through the apicurio api. (i.e. if using the apicurio api for the producer and confluent api for the consumer)

I notice that there are utilities already present in the apicurio codebase to convert between these representations already, so it should be technically feasible to work this out.

@wolfchimneyrock
Copy link
Contributor Author

I see in packages:
apicurio-registry-schema-util-provider (

)

and

apicurio-registry-serde-common-protobuf (https://github.com/Apicurio/apicurio-registry/blob/main/serdes/generic/serde-common-protobuf/src/main/java/io/apicurio/registry/serde/protobuf/ProtobufSchemaParser.java#L69)

there are duplicate implementations of this same proposed functionality, maybe it makes sense to combine them into one implementation to also be referenced from schema-util-protobuf for the rules evaluators?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant