Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix string comparison for memory overhead in pinned pool size recommendation in AutoTuner #1508

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

parthosa
Copy link
Collaborator

Fixes #1398.

There were two bugs related to setting pinned pool size:

  1. Incorrect cluster memory calculation with dynamic allocation enabled - This was resolved in #1121 by fixing cluster shape and memory calculation.

  2. Memory value string (e.g., 4G) was incorrectly compared to the label spark.executor.memoryOverhead.

This PR specifically fixes the second issue by ensuring the memory comparison is performed correctly. The corrected condition is as follows:

If using k8s and pinned pool size is set, add a comment if memory overhead is missing

Key changes:

Handling Spark Master Types:

  • Added a sealed trait SparkMaster with case objects Local, Yarn, Kubernetes, and Standalone to represent different Spark master types.
  • Introduced a lazy val sparkMaster in the AutoTuner class to determine the Spark master type based on the spark.master property.

Memory Overhead Recommendation Logic:

  • Updated the memoryOverheadLabel method to use the new sparkMaster field instead of directly accessing the spark.master property. [1] [2]
  • Modified the addRecommendationForMemoryOverhead method to check if the Spark master is not Standalone before adding memory overhead recommendations.
  • Removed the enableMemoryOverheadRecommendation method from the AutoTunerConfigsProvider trait as it is no longer needed.

Test Suite Updates:

  • Replaced direct assertions with the compareOutput method in the ProfilingAutoTunerSuite to improve test output comparison. [1] [2]
  • Removed redundant comments about spark.executor.memoryOverhead from multiple test cases in the ProfilingAutoTunerSuite. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Jan 22, 2025
@parthosa parthosa self-assigned this Jan 22, 2025
Signed-off-by: Partho Sarthi <[email protected]>
appendRecommendationForMemoryMB(memOverheadLookup, recomValue)
getPropertyValue("spark.rapids.memory.pinnedPool.size").foreach { lookup =>
if (lookup != "spark.executor.memoryOverhead") {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison was incorrect because lookup is a memory string value, whereas spark.executor.memoryOverhead is a label

@parthosa parthosa marked this pull request as ready for review January 22, 2025 22:44
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @parthosa! some minor nits and questions

case object Standalone extends SparkMaster

object SparkMaster {
def apply(master: Option[String]): Option[SparkMaster] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent here is inconsistent 4 vs 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
platform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we probably do not need a newline here -
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, platform)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
platform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
platform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// This UT sets a custom spark-property "spark.master" pointing to a spark
// k8s value. The Autotuner should detect that the spark-master is k8s and
// should not comment on the missing memoryOverhead value since pinned pool is not set.
test(s"missing memoryOverhead comment is not included for k8s without pinned pool") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include spark-version in the test comment to differentiate between the 2 test runs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included spark version in the test name since spark version is dynamic and set during runtime.

// value. The Autotuner should detect that the spark-master is yarn and
// should not comment on the missing memoryOverhead value even though pinned
// pool is set.
test("missing memoryOverhead comment is not included for yarn") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include spark-version in the test comment to differentiate between the 2 test runs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included spark version in the test name since spark version is dynamic and set during runtime.

// if using k8s and pinned pool size is set, add a comment if memory overhead is missing
if (sparkMaster.contains(Kubernetes) &&
getPropertyValue(pinnedPoolSizeLookup).isDefined &&
getPropertyValue(memOverheadLookup).isEmpty) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent indent here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Partho Sarthi <[email protected]>
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa for this fix! LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Pinned pool size setting may be missing from the autotuner recommendation
2 participants