PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 #43

haojinIntel · 2021-06-02T02:41:40Z

We use the same configuration to run K-means & SVM algorithm. The cluster contains 3 workers and each contains 1TB PMEM. The performance has 12.9% regression when running SVM 1.2TB scale and 28.6% regression when running K-means 500GB.
The configuration of spark when running SVM is showed below:

spark.memory.pmem.extension.enabled true
hibench.streambench.spark.checkpointPath /var/tmp
spark.storage.unrollMemoryThreshold 1048576
hibench.streambench.spark.receiverNumber 4
spark.yarn.historyServer.address vsr219:18080
spark.memory.pmem.initial.size 450GB
hibench.yarn.executor.cores 45
spark.executor.memory 90g
hibench.streambench.spark.useDirectMode true
spark.eventLog.dir hdfs://vsr219:9000/spark-history-server
spark.driver.memory 10g
spark.eventLog.enabled true
spark.memory.spill.pmem.enabled false
spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.kryo.unsafe true
hibench.yarn.executor.num 6
spark.history.fs.logDirectory hdfs://vsr219:9000/spark-history-server
spark.files /opt/Beaver/OAP/oap_jar/pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar,/opt/Beaver/OAP/oap_jar/pmem-common-1.1.1-with-spark-3.1.1.jar
spark.executor.extraClassPath ./pmem-rdd-cache-1.1.1-with-spark-3.1.1.jar:./pmem-common-1.1.1-with-spark-3.1.1.jar
spark.history.fs.cleaner.enabled true
spark.default.parallelism ${hibench.default.map.parallelism}
spark.serializer.bufferedInputStreamSize 4096
hibench.streambench.spark.storageLevel 2
hibench.streambench.spark.batchInterval 100
hibench.spark.master yarn
spark.sql.shuffle.partitions 200
spark.history.ui.port 18080
hibench.spark.home /opt/Beaver/spark
spark.sql.warehouse.dir hdfs://vsr219:9000/spark-warehouse
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.pmem.initial.path /mnt/pmem0,/mnt/pmem1
hibench.streambench.spark.enableWAL false

haojinIntel · 2021-06-02T02:43:01Z

@zhixingheyi-tian @yma11 @winningsix @yeyuqiang Please help to track the performance issue.

winningsix · 2021-06-02T02:53:47Z

@haojinIntel Thanks for opening this tickets. @yma11 , any thing related to your code refactor? Didn't come up w/ any ideas why fails this.

yeyuqiang · 2021-06-11T02:37:38Z

Need to run with multiple executors to avoid slow task schedule in spark 3.1.1

haojinIntel changed the title ~~PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1~~ PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 Jun 2, 2021

github-actions bot mentioned this issue Jun 4, 2021

[PMEM-SPILL-43] fix missing update for BlockManager in spark 3.1.1 #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 #43

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 #43

haojinIntel commented Jun 2, 2021

haojinIntel commented Jun 2, 2021

winningsix commented Jun 2, 2021

yeyuqiang commented Jun 11, 2021

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 #43

PMEM-SPILL of OAP-1.1 has performance regression comparing with OAP-1.1.1 #43

Comments

haojinIntel commented Jun 2, 2021

haojinIntel commented Jun 2, 2021

winningsix commented Jun 2, 2021

yeyuqiang commented Jun 11, 2021