init: m5 forecasting FE benchmark #136

MarcoGorelli · 2024-09-04T13:25:55Z

Some results: https://www.kaggle.com/code/marcogorelli/m5-forecasting-feature-engineering-benchmark

ritchie46 · 2024-09-14T11:10:30Z

m5-forecasting-feature-engineering/polars_queries.py

+
+
+def q2_polars(df):
+    return df.with_columns(


Can we use the select + explode mapping here?

ritchie46 · 2024-09-14T11:14:47Z

m5-forecasting-feature-engineering/README.md

+Participants typically used pandas (Polars was only just getting started at the time), so here we benchmark how long it have
+taken to do the same feature engineering with Polars (and, coming soon, DuckDB).
+
+We believe this to be a useful task to benchmark, because:


I think we can remove L9-L12.

I think this can serve as a basis for more time-series related benchmarks on this datasets. I don't think we have to strictly limit to what was used in the kaggle competition.

MarcoGorelli · 2024-11-10T14:37:53Z

Just got back to this - running locally, I'm seeing very good results for Polars:

*** polars lazy ***
q1 took: 20.075744923997263
q2 took: 24.77264992900018
q3 took: 28.980969234995428
(polars-benchmark) marcogorelli@DESKTOP-U8OKFP3:~/polars-benchmark/m5-forecasting-feature-engineering$ python duckdb_queries.py 
*** duckdb ***
q1 took: 176.87156045399752
q2 took: 101.69693301500229
q3 took: 115.6844151769983

init: m5 forecasting FE benchmark

bab1673

MarcoGorelli force-pushed the m5-fe branch from 62062f8 to bab1673 Compare September 4, 2024 13:26

lint

df2ccb5

MarcoGorelli mentioned this pull request Sep 4, 2024

group_by+explode more than 3x faster than over pola-rs/polars#18556

Closed

2 tasks

ritchie46 force-pushed the main branch from 6f7780e to cf31c4d Compare September 13, 2024 14:04

ritchie46 reviewed Sep 14, 2024

View reviewed changes

MarcoGorelli marked this pull request as draft September 14, 2024 11:15

add some tests, rewrite using select + mapping strategy explode

0bf9fcd

MarcoGorelli force-pushed the m5-fe branch from 3ac2b04 to 0bf9fcd Compare November 10, 2024 14:33

MarcoGorelli marked this pull request as ready for review November 10, 2024 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init: m5 forecasting FE benchmark #136

init: m5 forecasting FE benchmark #136

MarcoGorelli commented Sep 4, 2024 •

edited

Loading

ritchie46 Sep 14, 2024

ritchie46 Sep 14, 2024

MarcoGorelli commented Nov 10, 2024

init: m5 forecasting FE benchmark #136

Are you sure you want to change the base?

init: m5 forecasting FE benchmark #136

Conversation

MarcoGorelli commented Sep 4, 2024 • edited Loading

ritchie46 Sep 14, 2024

Choose a reason for hiding this comment

ritchie46 Sep 14, 2024

Choose a reason for hiding this comment

MarcoGorelli commented Nov 10, 2024

MarcoGorelli commented Sep 4, 2024 •

edited

Loading