Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mypy - SchemaModel.validate does not return a DataFrame #763

Open
adrien-turiot-maxa opened this issue Feb 18, 2022 · 6 comments · May be fixed by #1450
Open

Mypy - SchemaModel.validate does not return a DataFrame #763

adrien-turiot-maxa opened this issue Feb 18, 2022 · 6 comments · May be fixed by #1450
Labels
bug Something isn't working

Comments

@adrien-turiot-maxa
Copy link

adrien-turiot-maxa commented Feb 18, 2022

The SchemaModel.validate function returns a DataFrameBase[T], which does not extend pd.DataFrame.

This makes type validations fail whenever a pd.DataFrame is expected. For example:

import pandera as pa
from pandera.typing import Series

class Schema(pa.SchemaModel):
    col1: Series[float]
    col2: Series[float]


existing_df = pd.DataFrame({"col1": [1, 2, 3], "col2": [1, 2, 3]})
result = Schema.validate(existing_df)

result.to_csv("test")        # mypy error: "DataFrameBase[Schema]" has no attribute "to_csv"
pd.concat([result, result])  # mypy error: List item has incompatible type "DataFrameBase[Schema]"

Why does Schema.validate return a DataFrameBase[T] instead of a DataFrame[T] ?

This is the same for the SchemaModel.example function.

(pandera version 0.9.0)

@adrien-turiot-maxa adrien-turiot-maxa added the bug Something isn't working label Feb 18, 2022
@lorenzo-w
Copy link

lorenzo-w commented Nov 10, 2022

Facing the same issue right now. I would like to validate my dataframes right after loading them from csv and then have the proper type annotation from there. Currently I am using a small custom function which calls SchemaModel.validate and then casts to DataFrame[T], but I would actually expect pandera to already return that....

@cosmicBboy
Copy link
Collaborator

Looking into this... basically need to do the following:

Probably for another PR, but will probably also need to overload the DataFrameSchema.validate method: https://github.com/unionai-oss/pandera/blob/main/pandera/schemas.py#L441-L450

@lorenzo-w would you be open to making a contribution here?

@lorenzo-w
Copy link

@cosmicBboy Wow thanks! That was the swiftest response I've ever had to a public issue. How could I say no then? 🙃
So yes, I'll take a shot at it this weekend and make a PR if it works.

@cosmicBboy
Copy link
Collaborator

Great @lorenzo-w ! The issue's been around for a while, so didn't want it to fall through the cracks again. Let me know if you need any help, check out the contribution guide to get your dev environment set up

@adzcai
Copy link

adzcai commented Jul 17, 2024

Also running into this issue and I'm happy to help. Just noting that for now you could also call DataFrame[Schema](existing_df) for validation and type-checking

@IanContrerasM
Copy link

Also running into this issue and I'm happy to help. Just noting that for now you could also call DataFrame[Schema](existing_df) for validation and type-checking

Yeah, but this method does not apply for lazy validation.

Also, I have been experiencing some issues when calling DataFrameSchema compared to Schema.validate(existing_df). It's as if the validation is not being executed on instantiation of the DataFrame[Schema] class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants