-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix string comparison for memory overhead in pinned pool size recommendation in AutoTuner #1508
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
appendRecommendationForMemoryMB(memOverheadLookup, recomValue) | ||
getPropertyValue("spark.rapids.memory.pinnedPool.size").foreach { lookup => | ||
if (lookup != "spark.executor.memoryOverhead") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comparison was incorrect because lookup
is a memory string value, whereas spark.executor.memoryOverhead
is a label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @parthosa! some minor nits and questions
case object Standalone extends SparkMaster | ||
|
||
object SparkMaster { | ||
def apply(master: Option[String]): Option[SparkMaster] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent here is inconsistent 4 vs 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt) | ||
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider | ||
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, | ||
platform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we probably do not need a newline here -
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt) | ||
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider | ||
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, | ||
platform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
val platform = PlatformFactory.createInstance(PlatformNames.ONPREM, clusterPropsOpt) | ||
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider | ||
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, | ||
platform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// This UT sets a custom spark-property "spark.master" pointing to a spark | ||
// k8s value. The Autotuner should detect that the spark-master is k8s and | ||
// should not comment on the missing memoryOverhead value since pinned pool is not set. | ||
test(s"missing memoryOverhead comment is not included for k8s without pinned pool") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include spark-version in the test comment to differentiate between the 2 test runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included spark version in the test name since spark version is dynamic and set during runtime.
// value. The Autotuner should detect that the spark-master is yarn and | ||
// should not comment on the missing memoryOverhead value even though pinned | ||
// pool is set. | ||
test("missing memoryOverhead comment is not included for yarn") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include spark-version in the test comment to differentiate between the 2 test runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included spark version in the test name since spark version is dynamic and set during runtime.
// if using k8s and pinned pool size is set, add a comment if memory overhead is missing | ||
if (sparkMaster.contains(Kubernetes) && | ||
getPropertyValue(pinnedPoolSizeLookup).isDefined && | ||
getPropertyValue(memOverheadLookup).isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent indent here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa for this fix! LGTM.
Fixes #1398.
There were two bugs related to setting pinned pool size:
Incorrect cluster memory calculation with dynamic allocation enabled - This was resolved in #1121 by fixing cluster shape and memory calculation.
Memory value string (e.g.,
4G
) was incorrectly compared to the labelspark.executor.memoryOverhead
.This PR specifically fixes the second issue by ensuring the memory comparison is performed correctly. The corrected condition is as follows:
Key changes:
Handling Spark Master Types:
SparkMaster
with case objectsLocal
,Yarn
,Kubernetes
, andStandalone
to represent different Spark master types.sparkMaster
in theAutoTuner
class to determine the Spark master type based on thespark.master
property.Memory Overhead Recommendation Logic:
memoryOverheadLabel
method to use the newsparkMaster
field instead of directly accessing thespark.master
property. [1] [2]addRecommendationForMemoryOverhead
method to check if the Spark master is notStandalone
before adding memory overhead recommendations.enableMemoryOverheadRecommendation
method from theAutoTunerConfigsProvider
trait as it is no longer needed.Test Suite Updates:
compareOutput
method in theProfilingAutoTunerSuite
to improve test output comparison. [1] [2]spark.executor.memoryOverhead
from multiple test cases in theProfilingAutoTunerSuite
. [1] [2] [3] [4] [5] [6] [7] [8] [9]