Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Always initialize ingestion statistics #30975

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Jan 8, 2025

Previously, collection statistics were only initialized for ingestion
dataflow outputs. When the force_source_table_syntax flag is enabled,
the ingestion collection is excluded from the ingestion dataflow
outputs. As a result, statistics are never created for the ingestion
collection. This causes later parts of the code to panic because it is
assumed that all ingestion collections have statistics initialized.

This commit fixes the issue by ensuring that statistics are always
initialized for ingestion collections, even if it's not included in
the dataflow outputs.

Works towards resolving #MaterializeInc/database-issues/issues/8620

Motivation

This PR adds a known-desirable feature.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Previously, async storage workers would calculate the as_of of a
dataflow from the since of the dataflow's output's remap shards. If
there was more than one distinct remap shard among the outputs, then
the storage worker would panic. It's expected that the only collection
that will ever have a remap shard is the ingestion collection itself.

Furthermore, we are planning to remove the ingestion collection from
the outputs of the dataflow (in fact there's already a feature flag
that does this). If the ingestion is removed from the outputs, then no
output will have a remap shard, and the as_of will always be empty.

This commit simplifies the existing as_of calculation and fixes the
as_of calculation when the ingestion collection is removed from the
outputs. It does this by calculating the as_of directly from the
ingestion's remap shard. Additionally, it asserts that if any of the
outputs have a remap shard, then it must be equal to the ingestion's
remap shard.

Works towards resolving #MaterializeInc/database-issues/issues/8620
Previously, collection statistics were only initialized for ingestion
dataflow outputs. When the `force_source_table_syntax` flag is enabled,
the ingestion collection is excluded from the ingestion dataflow
outputs. As a result, statistics are never created for the ingestion
collection. This causes later parts of the code to panic because it is
assumed that all ingestion collections have statistics initialized.

This commit fixes the issue by ensuring that statistics are always
initialized for ingestion collections, even if it's not included in
the dataflow outputs.

Works towards resolving #MaterializeInc/database-issues/issues/8620
@jkosh44 jkosh44 force-pushed the force-primary-export-stats branch from b84e290 to 9ade587 Compare January 9, 2025 13:48
@jkosh44 jkosh44 marked this pull request as ready for review January 10, 2025 13:31
@jkosh44 jkosh44 requested a review from a team as a code owner January 10, 2025 13:31
@jkosh44 jkosh44 marked this pull request as draft January 10, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant