Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Track Jvm based libhdfs performance gap issue #7746

Closed
JkSelf opened this issue Oct 31, 2024 · 11 comments
Closed

[VL] Track Jvm based libhdfs performance gap issue #7746

JkSelf opened this issue Oct 31, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@JkSelf
Copy link
Contributor

JkSelf commented Oct 31, 2024

Description

Currently, we supported JVM-based libhdfs in both the Velox and Gluten communities. However, the performance in Jenkins tests has not met our expectations.

  • Static linking with libhdfs3.so: 340%

  • Dynamic linking with JVM: 214.75% (requires further investigation)

  • Dynamic linking with libhdfs3: 323.85% (only replacing libhdfs.so with libhdfs3.so)

  • Expected performance for static linking in PR#7697: 340%

@JkSelf JkSelf added the enhancement New feature or request label Oct 31, 2024
@FelixYBW FelixYBW changed the title Track Jvm based libhdfs performance gap issue [VL] Track Jvm based libhdfs performance gap issue Nov 1, 2024
@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 1, 2024

FYI, static means link at compile time. Dynamic here means loading so library at runtime.

Ratio above means the performance boost compared to Vanilla Spark.

@zhanglistar
Copy link
Contributor

Didn't think that performance diff so fast.

@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 5, 2024

Looks the indirect call to jvm is not efficient.

@zhanglistar
Copy link
Contributor

I thought IO is the big head, indirect call is roughly the virtual table cost.

@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 6, 2024

All above perf test is on single node with enough SSDs. Not sure if the short circuit is enabled or not in JVM test, but even not the perf shouldn't drop so much.

@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 7, 2024

More clear. @JkSelf Can you also test the missing config?

 perf gain over vanilla link at compile time load at runtime
libhdfs3 3.4 3.23
jvm libhdfs 2.14

@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 8, 2024

@JkSelf Can you update above table with current data?

@zhanglistar
Copy link
Contributor

So you guys turn to static linking?

@FelixYBW
Copy link
Contributor

FelixYBW commented Nov 12, 2024

So you guys turn to static linking?

No, latest perf test shows libhdfs3 has the same perf. If libhdfs jvm also has the same perf, we can just use the runtime loading way.

 perf gain over vanilla link at compile time load at runtime
libhdfs3 3.28 3.28
jvm libhdfs 3.01 3.01

@JkSelf
Copy link
Contributor Author

JkSelf commented Nov 18, 2024

@FelixYBW @zhanglistar
After merging the JVM libhdfs patch PR#6172 on 10/25, the Jenkins performance significantly degraded compared to the test results from 10/24. Over the past few days, we identified three performance degradation issues:

  1. IO Thread Configuration: The IO thread configuration was different between the tests on 10/24 (disabled) and 10/25 (enabled), causing an increase in peak memory usage. This was fixed by disabling the IO thread configuration.
  2. Hadoop Configuration passing when support viewfs: When supporting viewfs, we passed the Hadoop configuration while converting the SplitInfo for each partition, resulting in approximately a 5% performance degradation. We added a configuration option to enable viewfs support and converted the viewfs path to the hdfs path after obtaining all partitions' SplitInfo in PR#7892 .
  3. Thread Local Parameter: To address Depeek's comment about forward reference, we moved the thread local parameter from a member parameter to a template parameter in the method. The frequent initialization of the thread local parameter in the method caused about a 5% performance degradation. We reverted the thread local parameter to a member parameter in PR#7966.

After addressing the above three issues, we re-tested the performance of dynamic and static linking of libhdfs and libhdfs3. The conclusions are as follows:

  1. The results for static linking and dynamic linking are almost same.
  2. The performance of libhdfs3 is 3.28x (1058s vs 3187s).
  3. The performance of libhdfs is 3.01x (971s vs 3187s).
  4. The performance of libhdfs is about 1.08x(971s vs 1058s) lower compared to libhdfs3, which aligns with our previous tests here.

@FelixYBW
Copy link
Contributor

Thank you Ke. Can you add a readme and describe how to config libhdfs3 and hadoop libhdfs? Is changing executorEnv.LD_LIBRARY_PATH enough?

Going fwd, we will only support run time load of libhdfs library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants