Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: collect dataframe as stream in __repr__ #1015

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

konjac
Copy link

@konjac konjac commented Feb 5, 2025

Which issue does this PR close?

Closes #1014.

Rationale for this change

The root cause for #1014 is _repr_ appends a limit clause and causes EXPLAIN not the top plan.

What changes are included in this PR?

This change refactors _repr_ to collect the dataframe as a stream.

Before the change,

>>> from datafusion import SessionContext
>>> ctx = SessionContext()
>>> ctx.sql("explain select 1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kuhuan/.local/lib/python3.10/site-packages/datafusion/dataframe.py", line 71, in __repr__
    return self.df.__repr__()
Exception: Internal error: Unsupported logical plan: Explain must be root of the plan.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

After the change

>>> from datafusion import SessionContext
>>> ctx = SessionContext()
>>> ctx.sql("explain select 1")
DataFrame()
+---------------+--------------------------------------+
| plan_type     | plan                                 |
+---------------+--------------------------------------+
| logical_plan  | Projection: Int64(1)                 |
|               |   EmptyRelation                      |
| physical_plan | ProjectionExec: expr=[1 as Int64(1)] |
|               |   PlaceholderRowExec                 |
|               |                                      |
+---------------+--------------------------------------+

Are there any user-facing changes?

No. Bugfix only.

@konjac
Copy link
Author

konjac commented Feb 8, 2025

Hi @timsaucer , could you help to trigger CI and also review the proposed change? Thank you!

…ious stream of batches. Also refactor _repr_html_.
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. It took me a while to parse through the logic of the get_batches. I think it's worth adding some documentation within the file to explain why we are doing this so that future maintainers can understand the necessity of doing it this way.

Thank you for the PR!

@konjac
Copy link
Author

konjac commented Feb 23, 2025

This looks good. It took me a while to parse through the logic of the get_batches. I think it's worth adding some documentation within the file to explain why we are doing this so that future maintainers can understand the necessity of doing it this way.

Thank you for the PR!

Thank you for the reviewing! A new iteration is pushed to add code comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to print EXPLAIN
2 participants