-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Change RemoveSnapshots to remove unused schemas #12089
base: main
Are you sure you want to change the base?
Conversation
cc @RussellSpitzer @rdblue @danielcweeks Would you mind taking a look? |
cc @advancedxy |
0884ada
to
e846792
Compare
.map(ManifestFile::partitionSpecId) | ||
.forEach(reachableSpecs::add)); | ||
snapshotId -> { | ||
base.snapshot(snapshotId).allManifests(ops.io()).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might want to extract base.snapshot(snapshotId)
into a separate variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return this; | ||
} | ||
|
||
Builder removeSchemas(Iterable<Integer> schemaIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add a test where schemaIds
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
schemas.stream() | ||
.filter(s -> !schemaIdsToRemove.contains(s.schemaId())) | ||
.collect(Collectors.toList()); | ||
changes.add(new MetadataUpdate.RemoveSchemas(schemaIdsToRemove)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only create a change when schemaIdsToRemove
was non-empty. This would most likely also apply for removePartitionSpecs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Done
Will create a separate PR for the same for RemovePartitionSpecs once we sort this one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prepared another PR for the same RemovePartitionSpecs improvement. Will publish it once this is merged (that also depends on the TestTables change in his PR and not sure how to publish stacked changes as separate PRs TBH)
assertThat(table.schemas().size()).isEqualTo(3); | ||
|
||
Set<String> deletedFiles = Sets.newHashSet(); | ||
// Expire all snapshots except the current one. Also expire all schemas except the current one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Expire all snapshots except the current one. Also expire all schemas except the current one. | |
// Expire all snapshots/schemas except the current one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e846792
to
da34b00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look, @nastra !
.map(ManifestFile::partitionSpecId) | ||
.forEach(reachableSpecs::add)); | ||
snapshotId -> { | ||
base.snapshot(snapshotId).allManifests(ops.io()).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return this; | ||
} | ||
|
||
Builder removeSchemas(Iterable<Integer> schemaIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
schemas.stream() | ||
.filter(s -> !schemaIdsToRemove.contains(s.schemaId())) | ||
.collect(Collectors.toList()); | ||
changes.add(new MetadataUpdate.RemoveSchemas(schemaIdsToRemove)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Done
Will create a separate PR for the same for RemovePartitionSpecs once we sort this one out.
assertThat(table.schemas().size()).isEqualTo(3); | ||
|
||
Set<String> deletedFiles = Sets.newHashSet(); | ||
// Expire all snapshots except the current one. Also expire all schemas except the current one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -72,6 +72,26 @@ public static TestTable create( | |||
return new TestTable(ops, name); | |||
} | |||
|
|||
public static TestTable create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to change the other create
method to call this method and pass ops
so that we don't duplicate the same implementations:
public static TestTable create(
File temp,
String name,
Schema schema,
PartitionSpec spec,
SortOrder sortOrder,
int formatVersion) {
TestTableOperations ops = new TestTableOperations(name, temp);
return create(temp, name, schema, spec, sortOrder, formatVersion, ops);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Done.
Anyway, I was thinking of introducing a Builder for these test tables. I see that we have a number of variations now (with-without MetricsReporter, TableOperations and now in the partition stats PR we'll have one for props too) and might be cleaner to have a builder instead of these different create() methods. WDYT @nastra ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine if we start exploring and seeing how a Builder turns out with TestTable
but I wouldn't do it as part of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not within this PR :)
Similarly to removing partition specs, this PR improves RemoveSnapshots to also remove unused schemas.
da34b00
to
60744fa
Compare
I'll wait a bit with merging to give others some time to review this as well |
Similarly to removing partition specs, this PR improves RemoveSnapshots to also remove unused schemas.