Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add narwhals engine as dataframe compatibility layer to support multiple engine at once #1894

Open
dkapitan opened this issue Jan 8, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@dkapitan
Copy link

dkapitan commented Jan 8, 2025

Current status: feature set differs per engine

Pandera supports different engines, including pandas, polars, modin, Dask and pyspark. Feature completeness differs between these engines, most notably support for pydantic v2 (see #1874 and beda-software/FHIRPathMappingLanguage#18).

Narwhals aims to address this issue by providing a lightweight and extensible compatibility layer between dataframe libraries. It has gained traction and Python projects, as for example Altair, have recently removed the dependency from pandas and integrated Narwhals instead.

Desired situation: more complete feature set across engines

While this issue was triggers by #1874, I see a larger benefit by implementing Narhwals to support multiple dataframe libraries at once. Provided that the Narwhals API provides the functionality required by pandera (I have done a quick check and haven't spotted a blocking issue yet; no guarantees though, since I am quite new to pandera).

Alternatie situation: implement PydanticModel for polars

pandas_engine.PydanticModel into polars_engine in case Narwhals doesn't support the functionality needed by pandera.

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Jan 8, 2025

Thanks for creating this @dkapitan !

I see several steps for this to become mainstream:

  • Implement narwhals backend, equivalent to the polars backend
  • Register this backend against the pandas and polars pandera API objects
  • Incrementally get feature-parity with the pandera's pandas backend implementation
  • (eventually) swap out the default backends for pandas, polars (and beyond) with the narwhals backend

Narwhals still needs the corresponding libraries to be installed to work right? i.e. If I want to use pandas with narwhals, I still need pandas installed.

@dkapitan
Copy link
Author

dkapitan commented Jan 8, 2025

@cosmicBboy

Makes sense to implement feature parity with polars through narwals first and then take it from there.

Indeed you need to install the engine if you want to work with it, but the pandera library itself can remove the dependency on polars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants