Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

107: timestamps for Polars & Narwhals #267

Merged
merged 1 commit into from
Nov 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions 2024/107-marco-polars.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,45 @@ https://www.meetup.com/data-umbrella
- Polars documentation: https://docs.pola.rs/
- Narwhals: https://narwhals-dev.github.io/narwhals/
- Slides: https://github.com/data-umbrella/event-transcripts/blob/main/resources/polars-narwhals.pdf
- Video: Polars for Data Analysis in Python: https://youtu.be/5V_MvnwTVwc

## About the Event
When it comes to dataframes, pandas is the go-to library for many people. Yet Polars is taking the world by storm, and so many data practitioners are curious about trying it out. There is a learning curve though, as Polars introduces some concepts which pandas users might not be familiar with. This talk will be a deep dive into one of those concepts (expressions) and will focus on how you can understand them from a pandas perspective.

The lessons learned will be useful beyond Polars, as they will also enable you to use Narwhals. Narwhals is a lightweight and extensible compatibility layer between dataframe libraries which is gaining traction (Altair, Marimo, scikit-lego, and more are currently using it) - like Polars, its API is also based on expressions. By learning this concept, you will not only be able to use Polars efficiently, but you'll also know how to build dataframe-agnostic tools.

Polars: https://docs.pola.rs/
Narwhals: https://narwhals-dev.github.io/narwhals/
```
## Timestamps
00:00 Help us add timestamps
00:00 Data Umbrella introduction
05:07 Marco begins presentation
06:15 Timeline / Agenda of presentation
07:15 Why care (about Polars)?
08:05 Polars crash course
08:10 -- DataFrame
09:02 -- Series
09:38 -- Expressions
10:28 Expressions: a light introduction / selection
11:18 Functions: a detour
13:10 scikit-learn uses Polars in some of their documentation
13:53 Expressions: multiple inputs (and outputs)
16:35 Expressions: summary
17:15 What about group-by aggregations?
18:20 pl.col(‘weight’).sum() (data type is preserved)
18:47 Expressions in group-by
20:05 pandas syntax comparison
21:05 Expressions: can we use them in pandas
23:20 Should pandas adopt the expressions in pandas? Let us know.
23:30 Narwhals
25:05 Conclusion
26:49 Q: Do polars expressions support Python type annotations?
27:35 Q: If I start a new project, should I use Polars and not pandas?
28:58 Q: What size datasets can I use for Polars? Streaming in Polars
30:03 Q: Is Narwhals a way to provide an array API equivalent for dataframes?
31:20 Q: How is it able to achieve multi-threading?
33:00 Q: Could we use Polars as input to scikit-learn?
33:15 Q: When you make a copy of a dataframe, is it still a shallow copy?
33:48 Q: Any disadvantage to using Polars?
34:25 Q: Should I learn Polars instead of pandas?
```

https://github.com/data-umbrella/event-transcripts/issues/92
Expand Down