-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of schemas whose fields are not in accordance with the rules for Python variables #628
Comments
Hi @anneum Yes, we should definitely generate proper class attributes names in |
Following your example, you could do the following (if the field is generated properly): from dataclasses_avroschema.pydantic import AvroBaseModel
from dataclasses_avroschema import case
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str
print(SchemaAvroBenchmark.avro_schema(case_type=case.SPINALCASE))
{"type": "record", "name": "SchemaAvroBenchmark", "fields": [{"name": "scope-case", "type": "string"}]} |
@anneum if you update to the latest version ( |
That's great, thanks for that. I see a little problem when we specify the case. It also overwrites field names that do not match the case. Original Schema: {
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"name": "scope-case",
"type": "string"
},
{
"name": "scope_case2",
"type": "string"
}
]
} After the from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str
scope_case2: str
print(SchemaAvroBenchmark.avro_schema(case_type=case.SPINALCASE)) After the conversion into a avro schema: {
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"name": "scope-case",
"type": "string"
},
{
"name": "scope-case-2",
"type": "string"
}
]
} Therefore, I suggest that we use the pydantic field with the The from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(..., serialization_alias='scope-case')
scope_case2: str
print(SchemaAvroBenchmark.avro_schema()) |
As an addition, I think the |
Ok, we can add the {
"type": "record",
"name": "SchemaAvroBenchmark",
"fields": [
{
"aliases": [
"my-scope-case",
"renamed-scope-case"
],
"name": "scope-case",
"type": "string"
}
]
} Maybe we should use the from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(metadata={"aliases": ["scope-case", ...]) Then when generating the schema from the model, we will have the extra alias "scope-case". Does it work for you? If it does, then it is quite easy to implement. |
The metadata field sounds like a good option, but I would use it as a string instead of a list. This is because there can only be one alias that could not be used as a field name because of its name. So if the field name in the schema is not valid (according to python rules), we add the original field name as an alias in the metadata and change the field name for the variable label in the pydantic model to a valid name. The reason I would have liked to use the from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(..., serialization_alias='scope-case')
scope_case2: str
SchemaAvroBenchmark(scope_case='foo', scope_case2='bar').model_dump(by_alias=True)
# {'scope-case': 'foo', 'scope_case2': 'bar'} |
Question: are you using from pydantic import Field
from dataclasses_avroschema.pydantic import AvroBaseModel
class SchemaAvroBenchmark(AvroBaseModel):
scope_case: str = Field(serialization_alias='scope-case')
# serialize to avro-json to send to kafka just to see the fields (the same will happen with avro-binary)
benchmark = SchemaAvroBenchmark.fake()
print(benchmark, "\n\n")
>>> scope_case='FbeDbPMeawuTwxUbhSaY'
# This can be send to kafka (it is bytes)
ser = benchmark.serialize(serialization_type="avro-json")
# It will produce an event with the field `scope_case` and not with the alias.
print(set)
>>> b'{"scope_case": "FbeDbPMeawuTwxUbhSaY"}'
# This is with alias, but they are not bytes. Do not send to kafka.
print(benchmark.model_dump(by_alias=True))
>>> {"scope-case'": 'FbeDbPMeawuTwxUbhSaY'} I will work in a PR to add the
|
Is your feature request related to a problem? Please describe.
When I convert an Avro schema whose fields do not match the rules for Python variables to a Pydantic (or Avrodantic) model, I get a non-valid model.
For example, if I convert the following schema, I get an invalid model:
Invalid value due to
-
in the variable nameIn my opinion, the
serialization_alias
fromFields
should be used for this.When converting the valid model to an Avro schema, the
serialization_alias
should be used as the field name.To summarize: Converting from a schema to a model and then back to a schema should result in the same schema.
I am already working on the implementation and getting the right result for such a simple schema (except for importing fields), but would appreciate support. In particular, I am having a lot of trouble with union cases.
The text was updated successfully, but these errors were encountered: