New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fix string comparison for memory overhead in pinned pool size recommendation in AutoTuner #1508

Open

parthosa wants to merge 5 commits into NVIDIA:dev from parthosa:spark-rapidst-tools-1398

Collaborator

parthosa commented Jan 22, 2025

Fixes #1398.

There were two bugs related to setting pinned pool size:

Incorrect cluster memory calculation with dynamic allocation enabled - This was resolved in #1121 by fixing cluster shape and memory calculation.
Memory value string (e.g., 4G) was incorrectly compared to the label spark.executor.memoryOverhead.

This PR specifically fixes the second issue by ensuring the memory comparison is performed correctly. The corrected condition is as follows:

If using k8s and pinned pool size is set, add a comment if memory overhead is missing

Key changes:

Handling Spark Master Types:

Added a sealed trait SparkMaster with case objects Local, Yarn, Kubernetes, and Standalone to represent different Spark master types.
Introduced a lazy val sparkMaster in the AutoTuner class to determine the Spark master type based on the spark.master property.

Memory Overhead Recommendation Logic:

Updated the memoryOverheadLabel method to use the new sparkMaster field instead of directly accessing the spark.master property. [1] [2]
Modified the addRecommendationForMemoryOverhead method to check if the Spark master is not Standalone before adding memory overhead recommendations.
Removed the enableMemoryOverheadRecommendation method from the AutoTunerConfigsProvider trait as it is no longer needed.

Test Suite Updates:

Replaced direct assertions with the compareOutput method in the ProfilingAutoTunerSuite to improve test output comparison. [1] [2]
Removed redundant comments about spark.executor.memoryOverhead from multiple test cases in the ProfilingAutoTunerSuite. [1] [2] [3] [4] [5] [6] [7] [8] [9]

parthosa added 3 commits

January 22, 2025 09:27


          Add unit tests

eb4a1e8

Signed-off-by: Partho Sarthi <[email protected]>


          Fix checking of memory overhead label

Signed-off-by: Partho Sarthi <[email protected]>


          Update unit tests

2da10c5

Signed-off-by: Partho Sarthi <[email protected]>

parthosa added bug core_tools labels

parthosa self-assigned this


          Fix unit tests

bddc968

Signed-off-by: Partho Sarthi <[email protected]>

parthosa commented

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/tuning/AutoTuner.scala

                     appendRecommendationForMemoryMB(memOverheadLookup, recomValue)
-                    getPropertyValue("spark.rapids.memory.pinnedPool.size").foreach { lookup =>
-                      if (lookup != "spark.executor.memoryOverhead") {

Collaborator Author

parthosa Jan 22, 2025

This comparison was incorrect because lookup is a memory string value, whereas spark.executor.memoryOverhead is a label

parthosa marked this pull request as ready for review

January 22, 2025 22:44

parthosa requested review from cindyyuanjiang, amahussein and tgravescs

January 22, 2025 22:44

cindyyuanjiang reviewed

View reviewed changes

Collaborator

cindyyuanjiang left a comment

thanks @parthosa! some minor nits and questions

core/src/main/scala/com/nvidia/spark/rapids/tool/tuning/AutoTuner.scala Outdated

+              case object Standalone extends SparkMaster
+              object SparkMaster {
+                  def apply(master: Option[String]): Option[SparkMaster] = {

Collaborator

cindyyuanjiang Jan 23, 2025

nit: indent here is inconsistent 4 vs 2

Collaborator Author

parthosa Jan 24, 2025

Fixed

core/src/test/scala/com/nvidia/spark/rapids/tool/tuning/ProfilingAutoTunerSuite.scala Outdated

+                    val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
+                    val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
+                      .buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
+                        platform)

Collaborator

cindyyuanjiang Jan 23, 2025

nit: we probably do not need a newline here -
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, platform)

Collaborator Author

parthosa Jan 24, 2025

Fixed

core/src/test/scala/com/nvidia/spark/rapids/tool/tuning/ProfilingAutoTunerSuite.scala Outdated

+                  val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
+                  val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
+                    .buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
+                      platform)

Collaborator

cindyyuanjiang Jan 23, 2025

nit: same as above

Collaborator Author

parthosa Jan 24, 2025

Fixed

core/src/test/scala/com/nvidia/spark/rapids/tool/tuning/ProfilingAutoTunerSuite.scala Outdated

+                  val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt)
+                  val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
+                    .buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider,
+                      platform)

Collaborator

cindyyuanjiang Jan 23, 2025

nit: same as above

Collaborator Author

parthosa Jan 24, 2025

Fixed

core/src/test/scala/com/nvidia/spark/rapids/tool/tuning/ProfilingAutoTunerSuite.scala Outdated

+                // This UT sets a custom spark-property "spark.master" pointing to a spark
+                // k8s value. The Autotuner should detect that the spark-master is k8s and
+                // should not comment on the missing memoryOverhead value since pinned pool is not set.
+                test(s"missing memoryOverhead comment is not included for k8s without pinned pool") {

Collaborator

cindyyuanjiang Jan 23, 2025

can we include spark-version in the test comment to differentiate between the 2 test runs?

Collaborator Author

parthosa Jan 24, 2025

Included spark version in the test name since spark version is dynamic and set during runtime.

core/src/test/scala/com/nvidia/spark/rapids/tool/tuning/ProfilingAutoTunerSuite.scala Outdated

+                // value. The Autotuner should detect that the spark-master is yarn and
+                // should not comment on the missing memoryOverhead value even though pinned
+                // pool is set.
+                test("missing memoryOverhead comment is not included for yarn") {

Collaborator

cindyyuanjiang Jan 23, 2025

can we include spark-version in the test comment to differentiate between the 2 test runs?

Collaborator Author

parthosa Jan 24, 2025

Included spark version in the test name since spark version is dynamic and set during runtime.

core/src/main/scala/com/nvidia/spark/rapids/tool/tuning/AutoTuner.scala Outdated

+                    // if using k8s and pinned pool size is set, add a comment if memory overhead is missing
+                    if (sparkMaster.contains(Kubernetes) &&
+                        getPropertyValue(pinnedPoolSizeLookup).isDefined &&
+                          getPropertyValue(memOverheadLookup).isEmpty) {

Collaborator

cindyyuanjiang Jan 23, 2025

nit: inconsistent indent here

Collaborator Author

parthosa Jan 24, 2025

Fixed


          Address review comments

efba7ff

Signed-off-by: Partho Sarthi <[email protected]>

parthosa requested a review from cindyyuanjiang

January 24, 2025 16:31

cindyyuanjiang approved these changes

View reviewed changes

Collaborator

cindyyuanjiang left a comment

Thanks @parthosa for this fix! LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels