Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rewrite joins + refactoring + adding missing join-file #24

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

SemyonSinchenko
Copy link
Collaborator

Close #21

It looks like it works:
image

Changes:

  • I refactored python part to simplify the code and reduce code-duplication;
  • After analyzing the joins logic I realized that the simplest way would be to pass pre-generated keys to the rust part;
  • Generation of joins is slightly slower from now because of doing magic with keys in numpy on the python side;

I tried SMALL and MEDIUM generations only, I don't think that my laptop is able to handle generation of BIG.

I'm trying to realize how keys manipulation can be faster on the python side.
@MrPowers Could you please try it? I'm not very familiar with this benchmark queries, I just tried one. Anyway you are the only person who can approve it ;)

@zhuqi-lucas cc

@SemyonSinchenko SemyonSinchenko added the enhancement New feature or request label Feb 12, 2025
@SemyonSinchenko SemyonSinchenko self-assigned this Feb 12, 2025
@zhuqi-lucas
Copy link

Thank you @SemyonSinchenko !

@SemyonSinchenko
Copy link
Collaborator Author

I cannot reach @MrPowers. Theoretically, I can bypass branch protection and merge. @zhuqi-lucas how urgent is this for you?

@zhuqi-lucas
Copy link

Thank you @SemyonSinchenko for your work, i also think we can merge it, and i can try it in apache datafusion as a follow-up, and if we meet new errors, we can push another fix, do you think so?

@SemyonSinchenko SemyonSinchenko merged commit 9961be0 into main Feb 24, 2025
39 checks passed
@SemyonSinchenko
Copy link
Collaborator Author

@zhuqi-lucas Merged & Released to PyPI (0.0.3). Feel free to ping me in case of any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Join datasets have values that seem off
2 participants