Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALTER TABLE MOVE to external table #595

Open
arthurpassos opened this issue Jan 20, 2025 · 8 comments
Open

ALTER TABLE MOVE to external table #595

arthurpassos opened this issue Jan 20, 2025 · 8 comments
Assignees

Comments

@arthurpassos
Copy link
Collaborator

For MergeTree tables we need an efficient way to move parts to Parquet. One part should map to one Parquet file, and MergeTree blocks should map to row groups. Sorting should be preserved.

@arthurpassos
Copy link
Collaborator Author

copy will suffice for now

@arthurpassos
Copy link
Collaborator Author

remember about It doesn't filter out rows that are deleted with lightweight deletes.

@arthurpassos
Copy link
Collaborator Author

Shoudl the user be able to specify the output file name? What about format settings?

@arthurpassos
Copy link
Collaborator Author

I missed the "move to external table" part. Format and storage location should be inferred from table.

Something like:

alter table xyz export partition 'abc' to table

@arthurpassos
Copy link
Collaborator Author

Well, since the goal is to move to a different table, this is slightly more complex. Some constraints light schema and partition expression must match, similar to attach partition functionality.

@arthurpassos
Copy link
Collaborator Author

I ignored the constraints for now and I am able to move a clickhouse partition from a merge tree table to s3 table engine, but there is a caveat: the partition must contain only one part. The reason is: multiple parts require multiple files on s3, therefore the s3 engine should be created with something that uniquely identifies files, such as uuid or _part macros. If a s3 table engine is created with any of those, it goes into readonly mode.

@arthurpassos
Copy link
Collaborator Author

I need to pay attention to this: MergeTree blocks should map to row groups. The current implementation will just call StorageObjectStorage::write API for each data part, but that doesn't guarantee the mapping from blocks to row groups.

@arthurpassos arthurpassos self-assigned this Jan 24, 2025
@arthurpassos
Copy link
Collaborator Author

Investigate why inserts are not supported and how can we fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant