Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant statistics from FileScanConfig #14955

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Standing-Man
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Both FileScanConfig and DataSource has same statistics, it make that statistics may have out-of-sync bugs.

What changes are included in this PR?

Remove redundant statistics from FileScanConfig and only a single statistics that held on the DataSource.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate proto Related to proto crate labels Mar 1, 2025
@Standing-Man Standing-Man marked this pull request as draft March 1, 2025 13:01
@Standing-Man Standing-Man force-pushed the redundant-statistics branch from f3eb12a to c819b16 Compare March 2, 2025 01:11
@github-actions github-actions bot removed the core Core DataFusion crate label Mar 2, 2025
@alamb
Copy link
Contributor

alamb commented Mar 3, 2025

Looks like there are some CI issues to address

Note that @blaginin fixed some issues recently, so if you merge up from main it might be better now

@Standing-Man
Copy link
Contributor Author

Standing-Man commented Mar 3, 2025

Looks like there are some CI issues to address

Note that @blaginin fixed some issues recently, so if you merge up from main it might be better now

Hi @alamb, i will try to merge up from main, Thanks.

@Standing-Man Standing-Man force-pushed the redundant-statistics branch from c819b16 to bdfb9ac Compare March 3, 2025 12:58
@github-actions github-actions bot added the datasource Changes to the datasource crate label Mar 3, 2025
@Standing-Man
Copy link
Contributor Author

Hi @alamb and @blaginin, I found that four tests failed due to the statistics num_rows and total_byte_size. I'm confused about how to proceed with fixing this issue, and I noticed that it is related to #14936.
image

@blaginin
Copy link
Contributor

blaginin commented Mar 3, 2025

i feel it may be easier if we fix #14936 first. I was planning to do it this week, but feel free to take over (just take the issue then)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove redundant statistics from FileScanConfig
3 participants