Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache + Repeated conversion gives NoneType #1881

Open
1 task
GabrieleLabanca opened this issue Dec 20, 2024 · 0 comments
Open
1 task

Cache + Repeated conversion gives NoneType #1881

GabrieleLabanca opened this issue Dec 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@GabrieleLabanca
Copy link

Describe the bug
When using type hints and conversion to pa.typing.DataFrame[schema] multiple times, the code fails in a non-deterministic way, depending on how many times one calls the function. The error is TypeError: 'NoneType' object is not iterable.

  • [ X ] I have checked that this issue has not already been reported.
  • [ X ] I have confirmed this bug exists on the latest version of pandera. (0.21.1)
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Required installs: pytest, pandas, pandera. You may need to set N_ITER higher to get it to fail, in my case it ranged around 8-16.

from functools import cache

import pandas as pd
from pandera.typing import DataFrame
from pandera import DataFrameModel

import pytest

N_ITER = 15

class SchemaRegistryDevice(DataFrameModel):
    id: str
    categ: int


data = {
    "id": ["1", "2", "3"],
    "categ": [10, 20, 30],
}


@cache
def get_df_sensors_registry() -> DataFrame[SchemaRegistryDevice]:

    df_registry = pd.DataFrame(data).pipe(DataFrame[SchemaRegistryDevice])

    df_casted = DataFrame[SchemaRegistryDevice](df_registry)
    assert df_casted is not None
    return df_casted


def get_reg_test() -> DataFrame[SchemaRegistryDevice]:
    df_reg = get_df_sensors_registry()

    return DataFrame[SchemaRegistryDevice](df_reg)


@pytest.mark.parametrize("n", range(N_ITER))
def test_bikes_registry(n):
    reg = get_reg_test()

Expected behavior

In this example, the validations/casts are of course too many, but this reproduces (as minimally as I could) piping multiple functions. I would expect, however, that validating/casting to a pa.typing.DataFrame multiple times be idempotent. Even more surprisingly, this fails only after a certain number of calls to the same function (but not the same object), so I suppose this has to do with some hidden memoization inside the DataFrameModel.

Some thoughts:

  • I could not reproduce it without caching, but I am not sure that this is the root cause, or only making the failure easier to happen (I need cache for my case anyway);
  • the iteration at which it failed varies on my very machine: in VSCode is 14, in another terminal emulator is 9.

The command I run: python3 -m pytest main.py.
The full error:

============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/gabriele/Downloads/panderabug
plugins: typeguard-4.4.1
collected 15 items                                                                                                                                                                                                                           

main.py .........FFFFFF                                                                                                                                                                                                                [100%]

================================================================================================================== FAILURES ==================================================================================================================
___________________________________________________________________________________________________________ test_bikes_registry[9] ___________________________________________________________________________________________________________

n = 9

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
__________________________________________________________________________________________________________ test_bikes_registry[10] ___________________________________________________________________________________________________________

n = 10

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
__________________________________________________________________________________________________________ test_bikes_registry[11] ___________________________________________________________________________________________________________

n = 11

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
__________________________________________________________________________________________________________ test_bikes_registry[12] ___________________________________________________________________________________________________________

n = 12

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
__________________________________________________________________________________________________________ test_bikes_registry[13] ___________________________________________________________________________________________________________

n = 13

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
__________________________________________________________________________________________________________ test_bikes_registry[14] ___________________________________________________________________________________________________________

n = 14

    @pytest.mark.parametrize("n", range(N_ITER))
    def test_bikes_registry(n):
>       reg = get_reg_test()

main.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
main.py:35: in get_reg_test
    return DataFrame[SchemaRegistryDevice](df_reg)
.venv/lib64/python3.13/site-packages/pandera/typing/common.py:127: in __patched_generic_alias_call
    result = self.__origin__(*args, **kwargs)
.venv/lib64/python3.13/site-packages/pandas/core/frame.py:712: in __init__
    data = data.copy(deep=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[TypeError("'NoneType' object is not iterable") raised in repr()] BlockManager object at 0x7fa850ab9910>, deep = False

    def copy(self, deep: bool | None | Literal["all"] = True) -> Self:
        """
        Make deep or shallow copy of BlockManager
    
        Parameters
        ----------
        deep : bool, string or None, default True
            If False or None, return a shallow copy (do not copy data)
            If 'all', copy data and a deep copy of the index
    
        Returns
        -------
        BlockManager
        """
        if deep is None:
            if using_copy_on_write():
                # use shallow copy
                deep = False
            else:
                # preserve deep copy for BlockManager with copy=None
                deep = True
    
        # this preserves the notion of view copying of axes
        if deep:
            # hit in e.g. tests.io.json.test_pandas
    
            def copy_func(ax):
                return ax.copy(deep=True) if deep == "all" else ax.view()
    
            new_axes = [copy_func(ax) for ax in self.axes]
        else:
            if using_copy_on_write():
                new_axes = [ax.view() for ax in self.axes]
            else:
>               new_axes = list(self.axes)
E               TypeError: 'NoneType' object is not iterable

.venv/lib64/python3.13/site-packages/pandas/core/internals/managers.py:591: TypeError
============================================================================================================== warnings summary ==============================================================================================================
main.py::test_bikes_registry[0]
main.py::test_bikes_registry[0]
main.py::test_bikes_registry[0]
main.py::test_bikes_registry[0]
main.py::test_bikes_registry[0]
main.py::test_bikes_registry[0]
  /home/gabriele/Downloads/panderabug/.venv/lib64/python3.13/site-packages/multimethod/__init__.py:453: DeprecationWarning: use `parametric(<base>, <func>)` as a type instead
    warnings.warn("use `parametric(<base>, <func>)` as a type instead", DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================================== short test summary info ===========================================================================================================
FAILED main.py::test_bikes_registry[9] - TypeError: 'NoneType' object is not iterable
FAILED main.py::test_bikes_registry[10] - TypeError: 'NoneType' object is not iterable
FAILED main.py::test_bikes_registry[11] - TypeError: 'NoneType' object is not iterable
FAILED main.py::test_bikes_registry[12] - TypeError: 'NoneType' object is not iterable
FAILED main.py::test_bikes_registry[13] - TypeError: 'NoneType' object is not iterable
FAILED main.py::test_bikes_registry[14] - TypeError: 'NoneType' object is not iterable
================================================================================================== 6 failed, 9 passed, 6 warnings in 1.20s ===================================================================================================

Desktop (please complete the following information):

  • OS: Fedora Linux 41
  • Version: Pandera (0.21.1)
@GabrieleLabanca GabrieleLabanca added the bug Something isn't working label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant