Skip to content

Commit

Permalink
Merge pull request #267 from reshamas/rs-polars-marco
Browse files Browse the repository at this point in the history
107: timestamps for Polars & Narwhals
  • Loading branch information
reshamas authored Nov 24, 2024
2 parents 3ec57dd + 9b525fd commit 321c52c
Showing 1 changed file with 31 additions and 3 deletions.
34 changes: 31 additions & 3 deletions 2024/107-marco-polars.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,45 @@ https://www.meetup.com/data-umbrella
- Polars documentation: https://docs.pola.rs/
- Narwhals: https://narwhals-dev.github.io/narwhals/
- Slides: https://github.com/data-umbrella/event-transcripts/blob/main/resources/polars-narwhals.pdf
- Video: Polars for Data Analysis in Python: https://youtu.be/5V_MvnwTVwc

## About the Event
When it comes to dataframes, pandas is the go-to library for many people. Yet Polars is taking the world by storm, and so many data practitioners are curious about trying it out. There is a learning curve though, as Polars introduces some concepts which pandas users might not be familiar with. This talk will be a deep dive into one of those concepts (expressions) and will focus on how you can understand them from a pandas perspective.

The lessons learned will be useful beyond Polars, as they will also enable you to use Narwhals. Narwhals is a lightweight and extensible compatibility layer between dataframe libraries which is gaining traction (Altair, Marimo, scikit-lego, and more are currently using it) - like Polars, its API is also based on expressions. By learning this concept, you will not only be able to use Polars efficiently, but you'll also know how to build dataframe-agnostic tools.

Polars: https://docs.pola.rs/
Narwhals: https://narwhals-dev.github.io/narwhals/
```
## Timestamps
00:00 Help us add timestamps
00:00 Data Umbrella introduction
05:07 Marco begins presentation
06:15 Timeline / Agenda of presentation
07:15 Why care (about Polars)?
08:05 Polars crash course
08:10 -- DataFrame
09:02 -- Series
09:38 -- Expressions
10:28 Expressions: a light introduction / selection
11:18 Functions: a detour
13:10 scikit-learn uses Polars in some of their documentation
13:53 Expressions: multiple inputs (and outputs)
16:35 Expressions: summary
17:15 What about group-by aggregations?
18:20 pl.col(‘weight’).sum() (data type is preserved)
18:47 Expressions in group-by
20:05 pandas syntax comparison
21:05 Expressions: can we use them in pandas
23:20 Should pandas adopt the expressions in pandas? Let us know.
23:30 Narwhals
25:05 Conclusion
26:49 Q: Do polars expressions support Python type annotations?
27:35 Q: If I start a new project, should I use Polars and not pandas?
28:58 Q: What size datasets can I use for Polars? Streaming in Polars
30:03 Q: Is Narwhals a way to provide an array API equivalent for dataframes?
31:20 Q: How is it able to achieve multi-threading?
33:00 Q: Could we use Polars as input to scikit-learn?
33:15 Q: When you make a copy of a dataframe, is it still a shallow copy?
33:48 Q: Any disadvantage to using Polars?
34:25 Q: Should I learn Polars instead of pandas?
```

https://github.com/data-umbrella/event-transcripts/issues/92
Expand Down

0 comments on commit 321c52c

Please sign in to comment.