Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Parser Exception when serializing empty schema/ model with avro-json #464

Open
ciarandorney opened this issue Nov 7, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@ciarandorney
Copy link

Describe the bug
Serializing empty schema/model using avro-json results in a fastavro Internal Parser Exception

To Reproduce

from uuid import UUID
from dataclasses_avroschema.pydantic import main as pydantic

class BaseMessage(pydantic.AvroBaseModel):

    pass

class Message(BaseMessage):

    message_uuid: UUID

Expected behavior
The should serialize without error

Actual behavior
avro serialization:

>>> BaseMessage.fake().serialize()
b''
>>> Message.fake().serialize()
b'H4253ac66-2d8b-4470-ba92-25b0b97540d5

avro-json serialization:

>>> Message.fake().serialize(serialization_type="avro-json")
b'{"message_uuid": "4b82449d-4b8f-45e0-a991-e70d8643d20d"}'
>>> BaseMessage.fake().serialize(serialization_type="avro-json")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/dataclasses_avroschema/pydantic/main.py", line 69, in serialize
    return serialization.serialize(
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/dataclasses_avroschema/serialization.py", line 28, in serialize
    fastavro.json_writer(file_like_output, schema, [payload])
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/fastavro/json_write.py", line 79, in json_writer
    return writer(
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/fastavro/_write_py.py", line 739, in writer
    output.flush()
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/fastavro/_write_py.py", line 590, in flush
    self.encoder.flush()
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/fastavro/io/json_encoder.py", line 96, in flush
    self._parser.flush()
  File "/Users/ciarandorney/Library/Caches/pypoetry/virtualenvs/py3.10/lib/python3.10/site-packages/fastavro/io/parser.py", line 167, in flush
    raise Exception(f"Internal Parser Exception: {top}")
Exception: Internal Parser Exception: <fastavro.io.symbols.Sequence object at 0x1076a9c00>

@marcosschroh
Copy link
Owner

Yes. The problem is that BaseMessage does not have any fields, which is expected:

print(BaseMessage.avro_schema())

# {"type": "record", "name": "BaseMessage", "fields": []}

Then is it nos possible to serialize an empty payload with a avro schema that does not have any fields. I think the error makes sense.

@ddevlin
Copy link
Contributor

ddevlin commented Nov 10, 2023

Hi @marcosschroh, we weren't sure if this was the expected behaviour as it's inconsistent between serialisation types, i.e. it's an error when serialising with the avro-json serialisation type, but with avro serialisation there's no error.

@marcosschroh
Copy link
Owner

marcosschroh commented Nov 14, 2023

Hi,

Even though it does not have too mush sense to have an avro schema without fields I agree that the behaviour is inconsistent. The output should be b'{}' for avro-json when the schema has not fields

@ciarandorney
Copy link
Author

ciarandorney commented Nov 15, 2023

Even though it does not have too mush sense to have an avro schema without fields I agree that the behaviour is inconsistent. The output should be b'{}' for avro-json when the schema has not fields

Hi @marcosschroh, does it make sense to raise this issue over at fastavro? It could be patched here but that would involve checking for nested schemas with no fields on serialization.

Another example that produces the same internal parser error:

Avro schema:
{'type': 'record', 'name': 'NestedMessage', 'fields': [{'name': 'uuid', 'type': {'type': 'string', 'logicalType': 'uuid'}, 'default': '13673bf0-37ef-45bf-bf96-1d633ff90275'}, {'name': 'timestamp', 'type': {'type': 'long', 'logicalType': 'timestamp-millis'}, 'default': 1700052483304}, {'name': 'nested', 'type': {'type': 'record', 'name': 'Nested', 'fields': []}, 'default': {}}]} 

Payload:
{'uuid': UUID('16aca70b-37ff-46e5-b055-20c8bbc3ea08'), 'timestamp': datetime.datetime(2012, 8, 20, 19, 19, 19, tzinfo=datetime.timezone.utc), 'nested': {}}

@marcosschroh
Copy link
Owner

I think we should create an issue in fastavro and in the meantime we can patch it here

@marcosschroh
Copy link
Owner

Does someone of you want to create the issue in fastavro? I can do it as well, no problem

@ciarandorney
Copy link
Author

Does someone of you want to create the issue in fastavro? I can do it as well, no problem

It's on my to-do 😅 If you want to do it go for it 😄

@ciarandorney
Copy link
Author

fastavro/fastavro#732

@marcosschroh marcosschroh added the bug Something isn't working label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants