STAR-1872: Parallelize UCS compactions per output shard #1342

blambov · 2024-10-09T15:08:19Z

This splits compactions that are to produce more than one
output sstable into tasks that can execute in parallel.
Such tasks share a transaction and have combined progress
and observer. Because we cannot mark parts of an sstable
as unneeded, the transaction is only applied when all
tasks have succeeded. This also means that early open
is not supported for such tasks.

At this time the new parallelization mechanism is not taken
into account by the thread allocation scheme, and thus
some levels may take more resources than they should.
Because of this limitation (which should be fixed in the
near future), the new behaviour is off by default.

Also:

Adds a flag to combine non-overlapping sets in major
compactions to reshard data, as major compactions can
can now be executed as a parallelized operation.
Changes SSTable expiration to be done in a separate
getNextBackgroundCompactions round to improve the
efficiency of expiration (separate task can run quickly
and remove the relevant sstables without waiting for
a compaction to end).
Applies small-partition-count correction in
ShardManager.calculateCombinedDensity.

eolivelli · 2024-10-16T06:44:21Z

src/java/org/apache/cassandra/db/lifecycle/CompositeLifecycleTransaction.java

+        partCommittedOrAborted();
+    }
+
+    private void partCommittedOrAborted()


what about passing here the partial transaction as a parameter ?
instead of having only a number partsToCommitOrAbort we could have a reference to all the child transactions and ensure that we don't count the same transaction twice and when we commit all of the children are in the expected state

You can't count the same transaction twice because of the protections in PartialLifecycleTransaction.

There could be some value in knowing which part did not complete, but because we don't have timeouts on these things (and actually can't, as compactions can last days) there's no obvious place to surface that information.

eolivelli · 2024-10-16T06:46:51Z

src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java

+        }
+        else
+        {
+            final CompressionMetadata compressionMetadata = getCompressionMetadata();


are we exercising this branch in the unit tests ?

There's a list of TODOs in UnifiedCompactionStrategy.createAndAddTasks where this is next in line.

eolivelli · 2024-10-16T06:49:24Z

src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.java

+            return tasks;
+    }
+
+    private <T> List<T> splitSSTablesInShards(Collection<SSTableReader> sstables,


what about making this method static and writing specific unit tests to cover all of the cases?

blambov · 2024-10-16T12:35:29Z

The PR is not yet ready for review.

This splits compactions that are to produce more than one output sstable into tasks that can execute in parallel. Such tasks share a transaction and have combined progress and observer. Because we cannot mark parts of an sstable as unneeded, the transaction is only applied when all tasks have succeeded. This also means that early open is not supported for such tasks. At this time the new parallelization mechanism is not taken into account by the thread allocation scheme, and thus some levels may take more resources than they should. Because of this limitation (which should be fixed in the near future), the new behaviour is off by default. Also: - Adds a flag to combine non-overlapping sets in major compactions to reshard data, as major compactions can can now be executed as a parallelized operation. - Changes SSTable expiration to be done in a separate getNextBackgroundCompactions round to improve the efficiency of expiration (separate task can run quickly and remove the relevant sstables without waiting for a compaction to end). - Applies small-partition-count correction in ShardManager.calculateCombinedDensity.

blambov · 2024-10-29T12:38:34Z

The patch is now ready for review.

…put_shards

sonarcloud · 2024-11-07T15:49:50Z

Quality Gate passed

Issues
1 New issue
2 Accepted issues

Measures
0 Security Hotspots
83.8% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

cassci-bot · 2024-11-07T15:53:13Z

❌ Build ds-cassandra-pr-gate/PR-1342 rejected by Butler

8 new test failure(s) in 16 builds
See build details here

Found 8 new test failures

Test	Explanation	Branch history	Upstream history
...,147,483,647 Modifier 1 Levels 3 Compactors 30]	regression	🔴🔵
...positePartitionKeyDataModel{primaryKey=p1, p2}]	regression	🔴🔴🔴🔴🔴🔴🔵	🔵🔵🔵🔵🔵🔵🔵
...positePartitionKeyDataModel{primaryKey=p1, p2}]	failing	🔴🔴🔴🔴🔴	🔵🔵🔵🔵🔵🔵🔵
...positePartitionKeyDataModel{primaryKey=p1, p2}]	regression	🔴🔵🔵🔴🔵🔴🔴	🔵🔵🔵🔵🔵🔵🔵
...positePartitionKeyDataModel{primaryKey=p1, p2}]	failing	🔴🔴🔴🔴🔴🔴🔴	🔵🔵🔵🔵🔵🔵🔵
...i.s.c.VectorSiftSmallTest.testMultiSegmentBuild	failing	🔴🔴🔴🔴🔴🔴🔴	🔵🔵🔵🔵🔵🔵🔵
...t,wide=false,scenario=POST_BUILD_QUERY]	regression	🔴🔵🔵🔵	🔵🔵🔵🔵🔵🔵🔵
...i.s.d.v.VectorCompressionTest.testOpenAiV3Small	flaky	🔵🔴🔵🔵	🔵🔵🔵🔵🔵🔵🔵

Found 100 known test failures

blambov force-pushed the STAR-1872 branch from 1da1be1 to c92f199 Compare October 14, 2024 08:29

eolivelli reviewed Oct 16, 2024

View reviewed changes

blambov force-pushed the STAR-1872 branch 2 times, most recently from 9512132 to 6cc862f Compare October 29, 2024 12:31

blambov force-pushed the STAR-1872 branch from 6cc862f to b6295c0 Compare October 29, 2024 12:38

blambov added 11 commits October 30, 2024 15:16

Coverage improvement and setting test-nocursor to run parallelize_out…

3353269

…put_shards

Take parallelism into account for UCS.getSelection

6a850fe

Fix and test parallelize and reshard UCS options

ef1b009

Test fixes

3136567

Split UnifiedCompactionStrategyGetSelectionTest to fix timeout

8dfb4f7

CNDB-11499: Fix incorrect thread names in CompactionControllerTest

6a90b29

sonarcloud

e071215

Implement parallelisation limit for getMaximalTasks

f3786c0

Change resharding options to be passed to getMaximalTasks

e7d7241

Rework reshard to split on common boundaries

94afba6

Drop the reshard option and always reshard

69a37d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAR-1872: Parallelize UCS compactions per output shard #1342

STAR-1872: Parallelize UCS compactions per output shard #1342

blambov commented Oct 9, 2024 •

edited

Loading

eolivelli Oct 16, 2024

blambov Oct 16, 2024

eolivelli Oct 16, 2024

blambov Oct 16, 2024

eolivelli Oct 16, 2024

blambov commented Oct 16, 2024

blambov commented Oct 29, 2024

sonarcloud bot commented Nov 7, 2024

cassci-bot commented Nov 7, 2024

STAR-1872: Parallelize UCS compactions per output shard #1342

Are you sure you want to change the base?

STAR-1872: Parallelize UCS compactions per output shard #1342

Conversation

blambov commented Oct 9, 2024 • edited Loading

eolivelli Oct 16, 2024

Choose a reason for hiding this comment

blambov Oct 16, 2024

Choose a reason for hiding this comment

eolivelli Oct 16, 2024

Choose a reason for hiding this comment

blambov Oct 16, 2024

Choose a reason for hiding this comment

eolivelli Oct 16, 2024

Choose a reason for hiding this comment

blambov commented Oct 16, 2024

blambov commented Oct 29, 2024

sonarcloud bot commented Nov 7, 2024

Quality Gate passed

cassci-bot commented Nov 7, 2024

❌ Build ds-cassandra-pr-gate/PR-1342 rejected by Butler

Found 8 new test failures

Found 100 known test failures

blambov commented Oct 9, 2024 •

edited

Loading