Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better html repr for DataFrame #250

Open
oscar6echo opened this issue Jul 29, 2024 · 11 comments
Open

better html repr for DataFrame #250

oscar6echo opened this issue Jul 29, 2024 · 11 comments
Labels
good first issue Good for newcomers

Comments

@oscar6echo
Copy link

oscar6echo commented Jul 29, 2024

This is less a feature request than a question:

  • I cannot reproduce the dataframe display shown in README (and below) - which is the same as in python-polars. Is it still possible ? else why not ?

image

  • the display of a dataframe in a deno-kernel jupyter notebook is fine and similar to that of a python-kernel notebook except (by default):
    • the first and last rowns are displayed (for large df)
    • the shape is displayed on top

image

But for nodejs-polars only the 50 first rows are shown without indication of df shape.
Is it on purpose or a shortcut ?

Suggestion: If would help users if the py/js displays both in print/console.log and jupyter would match.

@universalmind303
Copy link
Collaborator

It does look like repr_html on the python side has quite a bit more logic than the JS side.

It should be pretty easy to copy over the python html logic.

@universalmind303 universalmind303 changed the title Dataframe display better html repr for DataFrame Jul 29, 2024
@universalmind303 universalmind303 added the good first issue Good for newcomers label Jul 29, 2024
@Bidek56
Copy link
Collaborator

Bidek56 commented Aug 3, 2024

I cannot reproduce the dataframe display shown in README (and below) - which is the same as in python-polars. Is it still possible ? else why not ?

It works fine for me using nodejs-polars v. 0.14.0, can please show your code and the output?

deno-kernel jupyter notebook shows the first 50 rows in order to not crash the browser with large output, but it's configurable using: process.env.POLARS_FMT_MAX_ROWS, this was discussed during the PR review.

@oscar6echo
Copy link
Author

oscar6echo commented Aug 6, 2024

I cannot reproduce the dataframe display shown in README (and below) - which is the same as in python-polars. Is it still possible ? else why not ?

It works fine for me using nodejs-polars v. 0.14.0, can please show your code and the output?

Here is the output from notebook, jupyter console and terminal.

1/ notebook
image

2/ jupyter console
image

3/ terminal
image

In neither of these cases can I reproduce the nice display shown in the README, which happens to be similar to that of the python version print(df).
What should I do to have it ?

@Bidek56
Copy link
Collaborator

Bidek56 commented Aug 6, 2024

I am not using Deno but using bun command line it works fine.

@oscar6echo
Copy link
Author

Ok, here is what i get with bun repl:

image

So the output contains what is shown in the README nice display but it is not quite the same.
I find it a bit disconcerting that such basic use is not reproducible on either deno or bun repl.

@oscar6echo
Copy link
Author

deno-kernel jupyter notebook shows the first 50 rows in order to not crash the browser with large output, but it's configurable using: process.env.POLARS_FMT_MAX_ROWS, this was discussed during the PR review.

It does look like repr_html on the python side has quite a bit more logic than the JS side.

Indeed please compare the python (arguably reference and certainly more informative) version:

You get the shape and the first/last rows/cols shown (controlled by POLARS_FMT_MAX_(ROW|COL)S).
image

While with nodejs-polars you get the first columns (controlled by POLARS_FMT_MAX_ROWS) without indication of shape.
image

It should be pretty easy to copy over the python html logic.

Perhaps to somebody who knows the inner workings of (1) polars-py (2) polars-nodejs (3) the various specifics of the target runtimes, nodej, deno, bun as the comment above shows they add their own layer before display.

For example I could not find where the selection of rows and cols (first and last selected based on env variables) is performed below polars/polars/dataframe/frame.py | repr_html

@Bidek56
Copy link
Collaborator

Bidek56 commented Aug 6, 2024

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for [Symbol.for("nodejs.util.inspect.custom")]().

> console.log(df);
shape: (5, 4)
┌─────┬────────┬─────┬────────┐
│ A   ┆ fruits ┆ B   ┆ cars   │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ f64 ┆ str    ┆ f64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 1.0 ┆ banana ┆ 5.0 ┆ beetle │
│ 2.0 ┆ banana ┆ 4.0 ┆ audi   │
│ 3.0 ┆ apple  ┆ 3.0 ┆ beetle │
│ 4.0 ┆ apple  ┆ 2.0 ┆ beetle │
│ 5.0 ┆ banana ┆ 1.0 ┆ beetle │
└─────┴────────┴─────┴────────┘

@oscar6echo
Copy link
Author

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for Symbol.for("nodejs.util.inspect.custom").

I get the same output!

It would be good that nodejs:polars be explicit about what runtimes should implement to output the proper display (as in README). Maybe this is already the case ? If so where ?

@universalmind303
Copy link
Collaborator

universalmind303 commented Aug 9, 2024

Can you please use: console.log(df); in bun repl? It works fine for me. I do re-call being an issue with bun implementation for Symbol.for("nodejs.util.inspect.custom").

I get the same output!

It would be good that nodejs:polars be explicit about what runtimes should implement to output the proper display (as in README). Maybe this is already the case ? If so where ?

@oscar6echo The formatting discrepancy is because unlike python and rust, there is no native way to overload methods, so we need to use a Proxy object to support some syntaxes such as bracket notation: df['column']

console.log should always print the correct output as most runtimes have standardized on using Symbol.for("nodejs.util.inspect.custom"), but unfortunately, there is no way to forward the inspect symbol to the dataframe class when wrapping it in a proxy. So it's either drop support for the functionality that the proxy provides, or use console.log.

Edit:

df.toString() should also work the same as console.log(df)

@oscar6echo
Copy link
Author

@universalmind303 thx for the insight.

So the working syntax with deno is console.log(df.toString()).
Maybe a bit verbose but output identical to python version. This is useful info !

Examples:

1/ small df

image

2/ larger df

image


so we need to use a Proxy object to support some syntaxes such as bracket notation: df['column']

Ok this is your decision - who am I to debate it - but the .select() syntax achieves the same, is central is polars-py, and more IDE friendly with completion etc. The df['mycol'] syntax seems mostly a contrived way to mimick pandas legacy API - I was a heavy pandas user and now an intensive polars-py one. One may argue this legacy API may not be worth keeping, in particular if it hinders basic user experience. 🤔

But this is only a side remark.
The main point is: Congrats and thank you for putting together and maintaining nodejs-polars 👍

@universalmind303
Copy link
Collaborator

Ok this is your decision - who am I to debate it - but the .select() syntax achieves the same, is central is polars-py, and more IDE friendly with completion etc. The df['mycol'] syntax seems mostly a contrived way to mimick pandas legacy API - I was a heavy pandas user and now an intensive polars-py one. One may argue this legacy API may not be worth keeping, in particular if it hinders basic user experience. 🤔

I have thought about deprecating the syntax as I too find the Proxy stuff a bit annoying. I know py-polars discourages the usage of it anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants