Skip to content

Commit

Permalink
docs(python): Add more examples to cover polars filter function behav…
Browse files Browse the repository at this point in the history
…iour
  • Loading branch information
atigbadr committed Jul 22, 2024
1 parent 1df3b0b commit 51665a5
Show file tree
Hide file tree
Showing 2 changed files with 138 additions and 25 deletions.
84 changes: 71 additions & 13 deletions py-polars/polars/dataframe/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4381,28 +4381,39 @@ def filter(
Each constraint will behave the same as `pl.col(name).eq(value)`, and
will be implicitly joined with the other filter conditions using `&`.
Notes
-----
If you are transitioning from pandas and performing filter operations based on
the comparison of two or more columns, please note that in Polars,
any comparison involving null values will always result in null.
As a result, these rows will be filtered out.
Ensure to handle null values appropriately to avoid unintended filtering
(See examples below).
Examples
--------
>>> df = pl.DataFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... "foo": [1, 2, 3, None, 4, None, 0],
... "bar": [6, 7, 8, None, None, 9, 0],
... "ham": ["a", "b", "c", None, "d", "e", "f"],
... }
... )
Filter on one condition:
>>> df.filter(pl.col("foo") > 1)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
shape: (3, 3)
┌─────┬──────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪══════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
│ 4 ┆ null ┆ d │
└─────┴──────┴─────┘
Filter on multiple conditions, combined with and/or operators:
Expand Down Expand Up @@ -4433,13 +4444,14 @@ def filter(
... pl.col("foo") <= 2,
... ~pl.col("ham").is_in(["b", "c"]),
... )
shape: (1, 3)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 0 ┆ 0 ┆ f │
└─────┴─────┴─────┘
Provide multiple filters using `**kwargs` syntax:
Expand All @@ -4453,6 +4465,52 @@ def filter(
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
└─────┴─────┴─────┘
Filter by comparing two columns against each other
>>> df.filter(pl.col("foo") == pl.col("bar"))
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ f │
└─────┴─────┴─────┘
>>> df.filter(pl.col("foo") != pl.col("bar"))
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
Notice how the row with `None` values is filtered out, in order to keep the
same behavior as pandas, Use:
>>> df.filter(
... (pl.col("foo") != pl.col("bar"))
... | (pl.any_horizontal(pl.col("foo", "bar").is_null()))
... )
shape: (6, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
│ null ┆ null ┆ null │
│ 4 ┆ null ┆ d │
│ null ┆ 9 ┆ e │
└──────┴──────┴──────┘
"""
return self.lazy().filter(*predicates, **constraints).collect(_eager=True)

Expand Down
79 changes: 67 additions & 12 deletions py-polars/polars/lazyframe/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2983,28 +2983,38 @@ def filter(
Each constraint will behave the same as `pl.col(name).eq(value)`, and
will be implicitly joined with the other filter conditions using `&`.
Notes
-----
If you are transitioning from pandas and performing filter operations based on
the comparison of two or more columns, please note that in Polars,
any comparison involving null values will always result in null.
As a result, these rows will be filtered out.
Ensure to handle null values appropriately to avoid unintended filtering
(See examples below).
Examples
--------
>>> lf = pl.LazyFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... "foo": [1, 2, 3, None, 4, None, 0],
... "bar": [6, 7, 8, None, None, 9, 0],
... "ham": ["a", "b", "c", None, "d", "e", "f"],
... }
... )
Filter on one condition:
>>> lf.filter(pl.col("foo") > 1).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
shape: (3, 3)
┌─────┬──────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪══════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
│ 4 ┆ null ┆ d │
└─────┴──────┴─────┘
Filter on multiple conditions:
Expand Down Expand Up @@ -3057,6 +3067,51 @@ def filter(
│ 1 ┆ 6 ┆ a │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
Filter by comparing two columns against each other
>>> lf.filter(pl.col("foo") == pl.col("bar")).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ f │
└─────┴─────┴─────┘
>>> lf.filter(pl.col("foo") != pl.col("bar")).collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
Notice how the row with `None` values is filtered out,
in order to keep the same behavior as pandas, Use:
>>> lf.filter(
... (pl.col("foo") != pl.col("bar"))
... | (pl.any_horizontal(pl.col("foo", "bar").is_null()))
... ).collect()
shape: (6, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
│ null ┆ null ┆ null │
│ 4 ┆ null ┆ d │
│ null ┆ 9 ┆ e │
└──────┴──────┴──────┘
"""
all_predicates: list[pl.Expr] = []
boolean_masks = []
Expand Down

0 comments on commit 51665a5

Please sign in to comment.