Failed writing a dataframe to '.avro' file #58

Anna050689 · 2024-09-12T09:31:48Z

Prerequisites:

Python 3.10
pandavro==1.8.0
fastavro==1.9.7

Steps to reproduce the issue:

Create a dataframe with the following data:

import pandas as pd

data = {
    'id': [545, 539, 643, 615, 502, 599, 542, 587, 537, 518],
    'first_name': ['caallai', 'Xzaaen', 'olrie', 'Iaairl', 'hfreiio', 'yieri', 'hcninn', 'irannir', 'Cmrnnan', 'Mnaeail'],
    'last_name': ['kroaoe', 'trrot', 'haill', 'kolide', 'errhnd', 'aoaoet', 'yBorrd', 'evbceyd', 'Wcnoee', 'eMloen'],
    'created_date': ['12/22/1992', '06/02/1992', '09/23/1998', '01/01/1997', '03/26/1990', '06/01/1996', '08/08/1992', '01/14/1995', '06/16/1992', '06/24/1991'],
    'Active': [False, False, False, False, False, True, False, False, False, True]
}
df = pd.DataFrame(data=data).astype('object')

Attempt to save the dataframe to an '.avro' file using the following command:

import pandavro as pdx

path = 'output.avro'
pdx.to_avro(path, df, schema=None)

Expected behavior:

The dataframe should be saved to an '.avro' file without any errors.

Actual behavior:

The following error is raised:

  File "fastavro/_write.pyx", line 779, in fastavro._write.writer
  File "fastavro/_write.pyx", line 687, in fastavro._write.Writer.__init__
  File "fastavro/_schema.pyx", line 173, in fastavro._schema.parse_schema
  File "fastavro/_schema.pyx", line 407, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 475, in fastavro._schema.parse_field
  File "fastavro/_schema.pyx", line 233, in fastavro._schema._parse_schema
  File "fastavro/_schema.pyx", line 263, in fastavro._schema._parse_schema
TypeError: argument of type 'NoneType' is not iterable

The inferred schema is:

{
    'fields': [
        {'name': 'id', 'type': ['null', None]},
        {'name': 'first_name', 'type': ['null', 'string']},
        {'name': 'last_name', 'type': ['null', 'string']},
        {'name': 'created_date', 'type': ['null', 'string']},
        {'name': 'Active', 'type': ['null', 'boolean']}
    ],
    'name': 'Root',
    'type': 'record'
}

Additional Information:

The issue occurs because the "id" column is inferred as ['null', None] instead of ['null', 'int'] when its data type is set to object.
When the "id" column has the data type integer, the process of saving the '.avro' file is successful.

Workaround:

As a temporary workaround, the data type of the "id" column should be explicitly set to integer before saving the dataframe to an '.avro' file:

df['id'] = df['id'].astype('int')
pdx.to_avro(path, df, schema=None)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed writing a dataframe to '.avro' file #58

Failed writing a dataframe to '.avro' file #58

Anna050689 commented Sep 12, 2024

Failed writing a dataframe to '.avro' file #58

Failed writing a dataframe to '.avro' file #58

Comments

Anna050689 commented Sep 12, 2024