Releases: unionai-oss/pandera
v0.22.0: Improve validation runtime performance by 4x
⭐️ Highlight
In this release, dependencies on multimethod
and wrapt
were removed and optimizations were made to speed up validation performance by up to 4x (depending on the validation rules. For simple cases speedup is ~4x see here).
What's Changed
- change order of Engine datatype resolution by @cosmicBboy in #1869
- Add missing Polars data types to docs by @ksolarski in #1872
- Fixing bug related to using coerce on a non-existent non-required column by @matt035343 in #1871
- handle non-bool dtypes in empty check outputs by @cosmicBboy in #1878
- Optimize validation runtime performance by @cosmicBboy in #1882
- bugfix/1846: remove wrapt as dependency from decorators.py by @cosmicBboy in #1883
New Contributors
- @ksolarski made their first contribution in #1872
- @matt035343 made their first contribution in #1871
Full Changelog: v0.21.1...v0.22.0
Release v0.21.1: Type bugfixes and regression fixes
What's Changed
- fix: remove Category inheritance from ArrowDictionary by @darenliang in #1848
- fix: preserve Check options in schema statistics roundtrip by @alexismanuel in #1844
- fix validation and coercion on init bug by @cosmicBboy in #1868
New Contributors
- @darenliang made their first contribution in #1848
- @alexismanuel made their first contribution in #1844
Full Changelog: v0.21.0...v0.21.1
Release v0.21.0: Reduce import and schema creation runtime, add docsearch search bar
⭐️ Highlights
This release optimizes the import and schema creation runtime so that importing pandera and creating a schema (without doing any validation) happens in ~5 ms (before it would be >800ms). It also updates the docs to use docsearch
for a better search experience.
- Defer backend registration to validation time by @cosmicBboy in #1818
- Reduce import overhead to improve runtime by @cosmicBboy in #1821
- Add docsearch by @cosmicBboy in #1814
What's Changed
- Upgrade multimethod to 1.12 by @jskrzypek in #1803
- bugfix: validation_enabled configuration correctly disables polars validation by @cosmicBboy in #1813
- Support Enum by @gab23r in #1798
- Revert/1803 by @cosmicBboy in #1815
- Add docsearch by @cosmicBboy in #1816
- Correct spelling in index.md by @galenseilis in #1819
- Bugfix/1760 Bad type hint for argument unique for DataFrameSchema by @Jarek-Rolski in #1817
- accept expr in default value by @gab23r in #1820
- fix: 🐛 add coerce value to pyarrow dtypes by @aaravind100 in #1850
- feat: add overload. by @yassun7010 in #1823
New Contributors
- @jskrzypek made their first contribution in #1803
- @galenseilis made their first contribution in #1819
- @Jarek-Rolski made their first contribution in #1817
- @yassun7010 made their first contribution in #1823
Full Changelog: v0.20.4...v0.21.0
Release v0.20.4: Bugfixes to polars & pyspark backends and more
What's Changed
- Bugfix/1732: Fix misleading error when columns are missing and lazy=True by @benlee1284 in #1752
- Bugfix/1644: refactor geopandas and pyarrow dtypes to avoid top-level import by @cosmicBboy in #1753
- regex column errors should report the correct column name by @cosmicBboy in #1754
- bugfix/1657: use rename instead of select in polars check backend by @cosmicBboy in #1757
- make sure registered checks supports error kwarg by @cosmicBboy in #1756
- make sure optional generic types are supported by @cosmicBboy in #1758
- fix: SQLModel table model not validated by @AlpAribal in #1696
- Restore accidentally-deleted use of "breakpoint()" by @deepyaman in #1763
- Swap
types-pkg_resources
withtypes-setuptools
by @deepyaman in #1779 - Add support for Spark Connect dataframes by @filipeo2-mck in #1775
- feat: select_columns reorders columns by default by @ldacey in #1783
- Update Polars dtype test to generate more examples by @deepyaman in #1770
- bugfix/1784 polars
DataFrameModel.to_json_schema()
fails on DateTime column by @AlpAribal in #1789 - fix pd.ArrowDtype use in pandera engine for old pd versions by @cosmicBboy in #1792
- Reexport polars function to match pyright expectation by @gab23r in #1797
New Contributors
- @benlee1284 made their first contribution in #1752
- @ldacey made their first contribution in #1783
- @gab23r made their first contribution in #1797
Full Changelog: v0.20.3...v0.20.4
Release v0.20.3: polars integration cleanup, docs updates, bugfixes
What's Changed
- update dtype api reference docs by @cosmicBboy in #1745
- handle deprecated methods/arguments in polars v1 by @cosmicBboy in #1746
- handle case when pandera is run with optimized python mode by @cosmicBboy in #1749
Full Changelog: v0.20.2...v0.20.3
Release v0.20.2: Complete pyarrow coverage, support polars v1
⭐️ Highlights:
- feat: add remaining pyarrow types by @aaravind100 in #1720
- Bugfix/1724: Add support for polars v1 by @cosmicBboy in #1725
What's Changed
- Depend on OpenJDK>8.0.0 for PySpark support by @billyvinning in #1701
- Update polars checks.py to avoid calling the check function multiple times by @jcadam14 in #1719
- str checks use plain string instead of re.Pattern by @cosmicBboy in #1729
- Document Field instance reuse workaround by @lundybernard in #1730
- add pyarrow docs by @cosmicBboy in #1739
- fix typing docs by @cosmicBboy in #1740
- Update docs: setup deps for algolia, modify pandera banner, fix API ref by @cosmicBboy in #1741
New Contributors
Full Changelog: v0.20.1...v0.20.2
Release v0.20.1: Bugfix for pyarrow dependency error
What's Changed
- fix: raising type error when pyarrow is not installed by @aaravind100 in #1717
- feat: add pyarrow list and struct to pandas engine by @aaravind100 in #1699
Full Changelog: v0.20.0...v0.20.1
Release v0.20.0: Pyarrow dtype support
⭐️ Highlights
- Pandera now supports pyarrow datatypes in the pandera validation engine! Big shoutout to @aaravind100 for the heavy lifting here.
- Added compatibility for numpy v2.
- Add compatibility for polars v1
pandera.SchemaModel
is now deprecated, usepandera.DataFrameModel
instead.
What's Changed
- Bugfix/1631: Series[Annotated[...]] DataFrameModel types should correctly create a DataFrameSchema by @cosmicBboy in #1633
- Add missing pandas import line. by @kyleweise in #1635
- add pandas pyarrow backend support by @aaravind100 in #1628
- bugfix: timezone-agnostic datetime in polars works in DataFrameModel by @cosmicBboy in #1638
- fix pandas pyarrow string validation by @aaravind100 in #1636
- Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #1619
- Updating Old
pandas-stubs
Link in Documentation by @bustosalex1 in #1648 - Bugfix: add missing
reason_code
for pyspark backend by @melvinkokxw in #1646 - change pandas engine to be numpy>2 compat by @cosmicBboy in #1690
- Minor documentation fix by @poulter7 in #1643
- perf: dataframe-level checks, fix polars tests by @cosmicBboy in #1702
- Docs: fix missing import in data conversion code cell by @billyvinning in #1700
- fix: DataFrameSchema repr formatting by @AlpAribal in #1694
- Fix coerion errors for
polars=1.0.0
by @MariusMerkleQC in #1706 - Solve deprecation warning on with_context by @MariusMerkleQC in #1705
- fix: default values set before coercion by @sanzoghenzo in #1708
- remove deprecated SchemaModel by @cosmicBboy in #1711
- Fix mismatched quotes, standardize CONTRIBUTING.md by @deepyaman in #1712
- Run CI on PRs to
ibis-dev
; stop forpolars-dev
by @deepyaman in #1713 - enable black for py311 by @lundybernard in #1697
- Updates to improve TryPandera documentation by @hendera2 in #1668
New Contributors
- @kyleweise made their first contribution in #1635
- @aaravind100 made their first contribution in #1628
- @bustosalex1 made their first contribution in #1648
- @melvinkokxw made their first contribution in #1646
- @poulter7 made their first contribution in #1643
- @billyvinning made their first contribution in #1700
- @AlpAribal made their first contribution in #1694
- @MariusMerkleQC made their first contribution in #1706
- @sanzoghenzo made their first contribution in #1708
- @lundybernard made their first contribution in #1697
- @hendera2 made their first contribution in #1668
Full Changelog: v0.19.2...v0.20.0
Release 0.19.3: Polars dtype bugfixes
What's Changed
- bugfix: timezone-agnostic datetime in polars works in DataFrameModel by @cosmicBboy #1638
- Bugfix/1631: Series[Annotated[...]] DataFrameModel types should correctly create a DataFrameSchema by @cosmicBboy in #1633
- Add missing pandas import line. by @kyleweise in #1635
New Contributors
- @kyleweise made their first contribution in #1635
Full Changelog: v0.19.2...v0.19.3
Release v0.19.2: Bugfix on correctly checking nullable Floats
What's Changed
- bugfix: nullable check float dtype handles nan and null by @cosmicBboy in #1627
Full Changelog: v0.19.1...v0.19.2