-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when filtering a Series using a condition from a DataFrame #2199
Comments
Unfortunately, the exact use case above is not supported. However, there is workaround by making the original series( So in the above example, >>> X = ks.DataFrame({
... 'A': [1,2,3,4,5]
... })
>>> y = ks.Series([1,0,1,0,1], name='A')
>>> y[X['A']>3]
3 0
4 1
Name: A, dtype: int64
>>> We will improve that in pandas API on Spark, under https://issues.apache.org/jira/browse/SPARK-36394. Thanks for letting us know! |
ueshin
pushed a commit
to apache/spark
that referenced
this issue
Sep 22, 2021
### What changes were proposed in this pull request? Fix filtering a Series (without a name) by a boolean Series. ### Why are the changes needed? A bugfix. The issue is raised as databricks/koalas#2199. ### Does this PR introduce _any_ user-facing change? Yes. #### From ```py >>> psser = ps.Series([0, 1, 2, 3, 4]) >>> ps.set_option('compute.ops_on_diff_frames', True) >>> psser.loc[ps.Series([True, True, True, False, False])] Traceback (most recent call last): ... KeyError: 'none key' ``` #### To ```py >>> psser = ps.Series([0, 1, 2, 3, 4]) >>> ps.set_option('compute.ops_on_diff_frames', True) >>> psser.loc[ps.Series([True, True, True, False, False])] 0 0 1 1 2 2 dtype: int64 ``` ### How was this patch tested? Unit test. Closes #34061 from xinrong-databricks/filter_series. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]> (cherry picked from commit 6a5ee02) Signed-off-by: Takuya UESHIN <[email protected]>
ueshin
pushed a commit
to apache/spark
that referenced
this issue
Sep 22, 2021
### What changes were proposed in this pull request? Fix filtering a Series (without a name) by a boolean Series. ### Why are the changes needed? A bugfix. The issue is raised as databricks/koalas#2199. ### Does this PR introduce _any_ user-facing change? Yes. #### From ```py >>> psser = ps.Series([0, 1, 2, 3, 4]) >>> ps.set_option('compute.ops_on_diff_frames', True) >>> psser.loc[ps.Series([True, True, True, False, False])] Traceback (most recent call last): ... KeyError: 'none key' ``` #### To ```py >>> psser = ps.Series([0, 1, 2, 3, 4]) >>> ps.set_option('compute.ops_on_diff_frames', True) >>> psser.loc[ps.Series([True, True, True, False, False])] 0 0 1 1 2 2 dtype: int64 ``` ### How was this patch tested? Unit test. Closes #34061 from xinrong-databricks/filter_series. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm wanting to filter down a Koalas Series based on a condition from a related Koalas DataFrame
However, running the last line gives the following error:
This syntax works as expected with Pandas Series/DataFrames:
Gives:
Note that I have the following option set:
This seems like a bug? Or does it need to be carried out in a different way?
The text was updated successfully, but these errors were encountered: