Error reading ods file with read_ods #14053

archqt · 2024-01-28T15:11:01Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

data = pl.read_ods(
    source = "test.ods",
    schema_overrides = {"dt":pl.String},
    raise_if_empty = False,
)

⬇️ test.ods

Log output

Traceback (most recent call last):
  File "/home/moi/Cours/Planning/planning.py", line 15, in <module>
    data=pl.read_ods(source="test.ods",
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 387, in read_ods
    return _read_spreadsheet(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 428, in _read_spreadsheet
    parsed_sheets = {
                    ^
  File "/usr/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 429, in <dictcomp>
    name: reader_fn(
          ^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 682, in _read_spreadsheet_ods
    df = pl.DataFrame(
         ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/dataframe/frame.py", line 377, in __init__
    self._df = sequence_to_pydf(
               ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/utils/_construction.py", line 989, in sequence_to_pydf
    return _sequence_to_pydf_dispatcher(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/functools.py", line 909, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/polars/utils/_construction.py", line 1133, in _sequence_of_sequence_to_pydf
    pydf = PyDataFrame.read_rows(
           ^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: could not append value: "Z" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`

it might also be that a value overflows the data-type's capacity

Issue description

Of course i removed lot of thing in the file to have this bug. But even if i remove the "Z" cell, i also have "duplicate name error". For now i will still use pandas, and i will test with polars for the next version.
Thanks for all

Expected behavior

No error if i remove the "Z" on the cell

Installed versions

--------Version info---------
Polars:               0.20.6
Index type:           UInt32
Platform:             Linux-6.7.1-arch1-1-x86_64-with-glibc2.38
Python:               3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               2023.9.2
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.3
openpyxl:             <not installed>
pandas:               1.5.3
pyarrow:              <not installed>
pydantic:             2.5.3
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             0.8.1
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

alexander-beedie · 2024-08-07T09:01:03Z

FYI: we updated the default engine for both Excel and ODS files to "calamine" somewhat recently, and this uses fastexcel to load the data instead. However, there is still a (slightly different) error.

I've taken your file above and created an even-more minimal test case for them, to demonstrate the issue (see: ToucanToco/fastexcel#275). @lukapeschke & @PrettyWood, if you can take a look that would be much appreciated! 😎

@archqt: In case the reformulating of your original file is partially responsible for uncovering the calamine error, can you try loading it again? (I'm also going to expose a "has_header" param for both read_ods and read_excel shortly, which may be useful for you as your original file doesn't appear to have table headers). Note that your "schema_overrides" parameter won't do anything as there doesn't seem to be a column called "dt".

PrettyWood · 2024-08-07T09:21:33Z

Hey! I'll look into it over the next few days 👍

archqt · 2024-08-10T17:44:23Z

I uptated to polars 1.4.1-1, it works with the file sucess.ods, but it failed with the file failure.ods

Traceback (most recent call last):

  File /usr/lib/python3.12/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File ~/Cours/Planning/planning.py:18
    data=pl.read_ods(source=lf[0])

  File /usr/lib/python3.12/site-packages/polars/io/spreadsheet/functions.py:439 in read_ods
    return _read_spreadsheet(

  File /usr/lib/python3.12/site-packages/polars/io/spreadsheet/functions.py:552 in _read_spreadsheet
    name: reader_fn(

  File /usr/lib/python3.12/site-packages/polars/io/spreadsheet/functions.py:890 in _read_spreadsheet_calamine
    ws_arrow = parser.load_sheet_eager(sheet_name, **read_options)

  File /usr/lib/python3.12/site-packages/fastexcel/__init__.py:203 in load_sheet_eager
    return self._reader.load_sheet(

CannotRetrieveCellDataError: cannot retrieve cell data at (8, 0)
Context:
    0: could not determine dtype for column StringCol

PrettyWood · 2024-08-10T19:55:11Z

Yes I made a fix on calamine side this morning

alexander-beedie · 2024-08-14T06:33:03Z

Yes I made a fix on calamine side this morning

Many thanks!

archqt · 2024-09-14T08:08:00Z

It still doesn't work, i have now polars 1.7.1

PrettyWood · 2024-09-14T09:35:26Z

We still need calamine to merge my fix. Then bump fastexcel. We are thinking about forking calamine to be faster

PrettyWood · 2024-10-14T17:40:35Z

it's now fixed in fastexcel 0.12.0 (released today)

archqt added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 28, 2024

stinodego added the A-io-spreadsheet Area: reading/writing Excel/ODS files label Jan 29, 2024

alexander-beedie removed the needs triage Awaiting prioritization by a maintainer label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error reading ods file with read_ods #14053

Error reading ods file with read_ods #14053

archqt commented Jan 28, 2024 •

edited by alexander-beedie

Loading

alexander-beedie commented Aug 7, 2024 •

edited

Loading

PrettyWood commented Aug 7, 2024

archqt commented Aug 10, 2024 •

edited

Loading

PrettyWood commented Aug 10, 2024 •

edited

Loading

alexander-beedie commented Aug 14, 2024

archqt commented Sep 14, 2024

PrettyWood commented Sep 14, 2024

PrettyWood commented Oct 14, 2024 •

edited

Loading

Error reading ods file with read_ods #14053

Error reading ods file with read_ods #14053

Comments

archqt commented Jan 28, 2024 • edited by alexander-beedie Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

alexander-beedie commented Aug 7, 2024 • edited Loading

PrettyWood commented Aug 7, 2024

archqt commented Aug 10, 2024 • edited Loading

PrettyWood commented Aug 10, 2024 • edited Loading

alexander-beedie commented Aug 14, 2024

archqt commented Sep 14, 2024

PrettyWood commented Sep 14, 2024

PrettyWood commented Oct 14, 2024 • edited Loading

archqt commented Jan 28, 2024 •

edited by alexander-beedie

Loading

alexander-beedie commented Aug 7, 2024 •

edited

Loading

archqt commented Aug 10, 2024 •

edited

Loading

PrettyWood commented Aug 10, 2024 •

edited

Loading

PrettyWood commented Oct 14, 2024 •

edited

Loading