Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] [DEMO] [SPARK] Use explicit Rowset time formatters for improving RowSet generation #5815

Conversation

bowenliang123
Copy link
Contributor

@bowenliang123 bowenliang123 commented Dec 4, 2023

🔍 Description

Issue References 🔗

Subtask of #5808.

Describe Your Solution 🔧

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Dec 4, 2023

It seems that the performance of (with-skip jump) primitive data types (shown in red block) is lowered, while of the fall-through data types(shown in the green block of Decimials, Date, and etc.) is improved.

cc @wForget @yaooqinn @pan3793

WDYT?

image

@wForget
Copy link
Member

wForget commented Dec 4, 2023

It seems that the performance of (with-skip jump) primitive data types (shown in red block) is lowered, while of the fall-through data types(shown in the green block of Decimials, Date, and etc.) is improved.

I don't see any changes related to primitive data types, can you increase the number of rows and try a few more times?

@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (25eb53d) 61.36% compared to head (2a84cbb) 61.41%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5815      +/-   ##
============================================
+ Coverage     61.36%   61.41%   +0.04%     
  Complexity       23       23              
============================================
  Files           608      608              
  Lines         35941    35944       +3     
  Branches       4940     4940              
============================================
+ Hits          22056    22075      +19     
+ Misses        11494    11482      -12     
+ Partials       2391     2387       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793
Copy link
Member

pan3793 commented Dec 5, 2023

@bowenliang123 I guess the overhead may come from the always construction of HiveResult.getTimeFormatters

@bowenliang123
Copy link
Contributor Author

@bowenliang123 I guess the overhead may come from the always construction of HiveResult.getTimeFormatters

Yes. For most cases, shown from evidence. But the overhead for primitive types should not comes from the data types, the time formatters are reused in the whole stage of TRowSet generation.

Still have to see more evidence for repeated tests. Please consider merge #5809 to introduce benchmark with warmups and repeating rounds, for the reference of further investigation.

@wForget
Copy link
Member

wForget commented Dec 5, 2023

Still have to see more evidence for repeated tests. Please consider merge #5809 to introduce benchmark with warmups and repeating rounds, for the reference of further investigation.

I think this is an obvious improvement. The benchmark unit test is controversial and not required, we can remove it first and continue reviewing the PR.

@bowenliang123
Copy link
Contributor Author

Closing this PR with no enough consensus on the purposes, the design, the changes and the approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants