Skip to content

Commit

Permalink
[SPARK-44021][SQL][FOLLOW-UP] Fix log messages when the number of par…
Browse files Browse the repository at this point in the history
…tition exceeds maxPartitionNum

### What changes were proposed in this pull request?

This PR makes it print the value instead of the option.

Previous messages:
```
./spark.20230707-19.log:23/07/07 19:15:58,858 WARN [HiveServer2-Background-Pool: Thread-121658] datasources.FilePartition:69 : The number of partitions is 298777, which exceeds the maximum number configured: Some(30000). Spark rescales it to 30134 by ignoring the configuration of spark.sql.files.maxPartitionBytes.
./spark.20230707-20.log:23/07/07 20:15:40,911 WARN [HiveServer2-Background-Pool: Thread-130816] datasources.FilePartition:69 : The number of partitions is 175266, which exceeds the maximum number configured: Some(30000). Spark rescales it to 33158 by ignoring the configuration of spark.sql.files.maxPartitionBytes.
./spark.20230707-20.log:23/07/07 20:16:56,748 WARN [HiveServer2-Background-Pool: Thread-131083] datasources.FilePartition:69 : The number of partitions is 298777, which exceeds the maximum number configured: Some(30000). Spark rescales it to 30134 by ignoring the configuration of spark.sql.files.maxPartitionBytes.
```

### Why are the changes needed?

Improve readability.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes apache#41976 from wangyum/maxPartitionNum.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Yuming Wang <[email protected]>
  • Loading branch information
wangyum committed Jul 15, 2023
1 parent e000cb8 commit 8a2e2f0
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,16 @@ object FilePartition extends Logging {
partitionedFiles: Seq[PartitionedFile],
maxSplitBytes: Long): Seq[FilePartition] = {
val openCostBytes = sparkSession.sessionState.conf.filesOpenCostInBytes
val maxPartitionNum = sparkSession.sessionState.conf.filesMaxPartitionNum
val maxPartNum = sparkSession.sessionState.conf.filesMaxPartitionNum
val partitions = getFilePartitions(partitionedFiles, maxSplitBytes, openCostBytes)
if (maxPartitionNum.exists(partitions.size > _)) {
if (maxPartNum.exists(partitions.size > _)) {
val totalSizeInBytes =
partitionedFiles.map(_.length + openCostBytes).map(BigDecimal(_)).sum[BigDecimal]
val desiredSplitBytes =
(totalSizeInBytes / BigDecimal(maxPartitionNum.get)).setScale(0, RoundingMode.UP).longValue
(totalSizeInBytes / BigDecimal(maxPartNum.get)).setScale(0, RoundingMode.UP).longValue
val desiredPartitions = getFilePartitions(partitionedFiles, desiredSplitBytes, openCostBytes)
logWarning(s"The number of partitions is ${partitions.size}, which exceeds the maximum " +
s"number configured: $maxPartitionNum. Spark rescales it to ${desiredPartitions.size} " +
s"number configured: ${maxPartNum.get}. Spark rescales it to ${desiredPartitions.size} " +
s"by ignoring the configuration of ${SQLConf.FILES_MAX_PARTITION_BYTES.key}.")
desiredPartitions
} else {
Expand Down

0 comments on commit 8a2e2f0

Please sign in to comment.