Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use duckdb loading throughout pyprophet #131

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jcharkow
Copy link
Contributor

@jcharkow jcharkow commented Dec 4, 2024

With export-parquet, we have an additional dependency of duckdb which allows for fast SQL queries, especially those involving lots of joins.

Here roll out duckdb SQL queries in pyprophet for greater data loading efficiency.

Examples

Conducted on dell XPS ubuntu

Export Command

time pyprophet export --in=39041_Hela_500ng_15SPD_DIA_Py3_1_S2-A7_1_4502.osw

Old timings:
real 0m56.284s
user 0m35.997s
sys 0m15.130s

New timings:
real 0m12.832s
user 0m40.578s
sys 0m8.378s

Score Command

  • Only 1 iteration so most of the time showcased is loading the data
    time pyprophet score --in=39041_Hela_500ng_15SPD_DIA_Py3_1_S2-A7_1_4502.osw --ss_num_iter=1

Old Timings:
real 0m59.466s
user 1m30.275s
sys 0m11.004s

New timings:
real 0m30.482s
user 1m21.186s
sys 0m9.460s

since duckdb is now a dependency and does joining of sql tables more
efficiently use it more widespread throughout pyprophet
since sql queries to no guarentee order, change from duckdb to regular
sql in this method as it is used in testing
@jcharkow
Copy link
Contributor Author

jcharkow commented Dec 4, 2024

I am not sure why the tests are not being conducted.

@jcharkow
Copy link
Contributor Author

jcharkow commented Dec 4, 2024

Ok I think the tests are passing just not appearing in this PR for some reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant