-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Export primary ingestion collection #30991
storage: Export primary ingestion collection #30991
Conversation
Previously, when the `force_source_table_syntax` flag was enabled, the primary source ingestion collection was not included in the source exports. This would cause the primary source ingestion collection's upper and since to be stuck at 0, and it would break some existing code. This commit always includes the primary ingestion collection in the source exports. However, when the `force_source_table_syntax` flag is enabled, then the source export details are set to `SourceExportDetails::None`. The result is that all source types with the flag enabled behave similarly to how multi-output sources behave with the flag disabled in regard to the primary ingestion collection. Specifically, their upper's and since's move forward in time and querying them returns an empty result. A downside of this commit is that a source ingestion is always scheduled with the flag enabled, even if there are no table exports. Previously, they would only be scheduled if there were table exports. Works towards resolving #MaterializeInc/database-issues/issues/8620
@@ -1549,6 +1550,19 @@ where | |||
if let DataSource::Ingestion(ingestion) = &mut description.data_source { | |||
if let Some(export) = ingestion.desc.primary_source_export() { | |||
ingestion.source_exports.insert(id, export); | |||
} else { | |||
let export = SourceExport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right logic but I wasn't expecting to happen here. Can you move this in the planner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried my best to move this to the planner. Let me know if that's what you had in mind. It involves not removing the primary export though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It involves not removing the primary export though.
I wasn't really sure how else to do it in the planner.
…mary-export-details-none
@def- what I'd really like for this PR is to take all of the tests that involve the |
Looks like key-value load gen sources specifically are broken with the
|
The problem originates form here: materialize/src/storage/src/source/generator.rs Lines 149 to 162 in 1fd704b
We only populate the output map with source exports that have a non Then in the key-value generator code, we assume that the output map is always non-empty, which is not true. So we panic. materialize/src/storage/src/source/generator/key_value.rs Lines 94 to 98 in 1fd704b
Hopefully we can just return early if the output map is empty. |
That does not work. For whatever reason that causes the upper to advance to the empty set.
If we then try and create a table on the source, it gets stuck at a since and upper of 0.
|
I tried a couple of quick and dirty fixes for the key-value load generator and nothing worked. I think we should address the issues in a follow-up PR and not this one. I opened the following issue for it: https://github.com/MaterializeInc/database-issues/issues/8904 |
…mary-export-details-none
Of course I immediately thought of something that worked: #31134. I still think it's best saved for it's own PR, so this PR should be ready for review and merge. |
…mary-export-details-none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to merge this as is! The main goal is to unblock further work on the migrations/testing required around force-enabling table-from-source syntax, and we can iterate on the specifics of main source exports after that.
# | ||
|
||
$ postgres-execute connection=postgres://mz_system:materialize@${testdrive.materialize-internal-sql-addr} | ||
ALTER SYSTEM SET force_source_table_syntax = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really is just a plain copy ... 😅
Previously, when the
force_source_table_syntax
flag was enabled, the primary source ingestion collection was not included in the source exports. This would cause the primary source ingestion collection's upper and since to be stuck at 0, and it would break some existing code.This commit always includes the primary ingestion collection in the source exports. However, when the
force_source_table_syntax
flag is enabled, then the source export details are set toSourceExportDetails::None
. The result is that all source types with the flag enabled behave similarly to how multi-output sources behave with the flag disabled in regard to the primary ingestion collection. Specifically, their upper's and since's move forward in time and querying them returns an empty result.A downside of this commit is that a source ingestion is always scheduled with the flag enabled, even if there are no table exports. Previously, they would only be scheduled if there were table exports.
Works towards resolving #MaterializeInc/database-issues/issues/8620
Motivation
This PR adds a known-desirable feature.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.