Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically set .batchSize based on the primary key #128

Open
msmygit opened this issue Apr 27, 2023 · 1 comment
Open

Automatically set .batchSize based on the primary key #128

msmygit opened this issue Apr 27, 2023 · 1 comment
Milestone

Comments

@msmygit
Copy link
Collaborator

msmygit commented Apr 27, 2023

Automatically set .batchSize to 1 if a table has a primary key that is also the partition key. Example tables as follows:

Example 1:

CREATE TABLE IF NOT EXISTS ks1.tbl1 (
  pk1 int,
  pk2 long,
  c1 text,
  c2 uuid,
  PRIMARY KEY ((pk1,pk2))
);

Example 2:

CREATE TABLE IF NOT EXISTS ks1.tbl1 (
  c1 text PRIMARY KEY,
  c2 uuid
);

or use the default.

@msmygit msmygit added this to the Version 4.0 milestone Apr 27, 2023
@mieslep
Copy link
Collaborator

mieslep commented Jun 8, 2023

@msmygit can I suggest we go one further...we know the partition key, and we're selecting by token range so should get the origin records in partition key order. Maybe we make this be a "max batch size" and allow it to be a really big number, and then automatically send the batch when the partition key changes?

Which is to say, the batch will be the smaller of "records in the partition" or .batchSize configuration setting? In this way, we would avoid multi-partition batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants