Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO-NOT-MERGE] Introduce new build tool sbt #1627

Closed
wants to merge 1 commit into from

Conversation

cfmcgrady
Copy link
Contributor

What changes were proposed in this pull request?

Introduce new build tool sbt

NOTICE AND LIMITATION :

  1. The make distribution via sbt is not supported.
  2. Only the Spark 3 Shade Client is supported and has been thoroughly tested now.
  3. It is advised not to use the jar packed via sbt in your production environment.

Why are the changes needed?

The build tool sbt demonstrates superior speed and performance compared to maven.

packing the project.

> ./build/sbt
...
sbt:celeborn-parent> clean
[success] Total time: 0 s, completed 2023-6-27 11:10:29
sbt:celeborn-parent> package
...
[success] Total time: 41 s, completed 2023-6-27 11:11:12

packing and shading the spark 3 client

> ./build/sbt -Pspark-3.3
...
sbt:celeborn-parent> clean
[success] Total time: 1 s, completed 2023-6-27 11:13:53
sbt:celeborn-parent> project spark-3-shaded
[info] set current project to celeborn-client-spark-3-shaded (in build file:/Users/fchen/Project/bigdata/celeborn/)
sbt:celeborn-client-spark-3-shaded> assembly
...
[info] Built: /Users/fchen/Project/bigdata/celeborn/client-spark/spark-3-shaded/target/scala-2.12/celeborn-client-spark-3-shaded-assembly-0.4.0-SNAPSHOT.jar
[info] Jar hash: 323de938b2359d0b6650a60bf414bbb66cbd002d
[success] Total time: 59 s, completed 2023-6-27 11:15:05

only shading the spark 3 client

sbt:celeborn-client-spark-3-shaded> assembly
[info] Assembly jar up to date: /Users/fchen/Project/bigdata/celeborn/client-spark/spark-3-shaded/target/scala-2.12/celeborn-client-spark-3-shaded-assembly-0.4.0-SNAPSHOT.jar
[success] Total time: 3 s, completed 2023-6-27 11:17:40

Life becomes much easier with sbt :)

Does this PR introduce any user-facing change?

Yes, introduce the new build tool sbt

How was this patch tested?

Manually tested

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Merging #1627 (b5eece3) into main (1b3ec61) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1627      +/-   ##
==========================================
- Coverage   45.92%   45.87%   -0.04%     
==========================================
  Files         159      159              
  Lines        9935     9924      -11     
  Branches      970      970              
==========================================
- Hits         4562     4552      -10     
+ Misses       5075     5070       -5     
- Partials      298      302       +4     

see 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@cfmcgrady
Copy link
Contributor Author

the diff between sbt and maven

diff --git a/tmp/maven-files.out b/tmp/sbt-files.out
index d56ee78..e61150c 100644
--- a/tmp/maven-files.out
+++ b/tmp/sbt-files.out
@@ -1,8 +1,6 @@
-./META-INF/DEPENDENCIES
-./META-INF/LICENSE
+./META-INF/INDEX.LIST
 ./META-INF/LICENSE.txt
 ./META-INF/MANIFEST.MF
-./META-INF/NOTICE
 ./META-INF/NOTICE.txt
 ./META-INF/io.netty.versions.properties
 ./META-INF/maven/com.google.guava/guava/pom.properties
@@ -69,16 +67,6 @@
 ./META-INF/maven/io.netty/netty-transport-udt/pom.xml
 ./META-INF/maven/io.netty/netty-transport/pom.properties
 ./META-INF/maven/io.netty/netty-transport/pom.xml
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-3-shaded_2.12/pom.properties
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-3-shaded_2.12/pom.xml
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-3_2.12/pom.properties
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-3_2.12/pom.xml
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-common_2.12/pom.properties
-./META-INF/maven/org.apache.celeborn/celeborn-client-spark-common_2.12/pom.xml
-./META-INF/maven/org.apache.celeborn/celeborn-client_2.12/pom.properties
-./META-INF/maven/org.apache.celeborn/celeborn-client_2.12/pom.xml
-./META-INF/maven/org.apache.celeborn/celeborn-common_2.12/pom.properties
-./META-INF/maven/org.apache.celeborn/celeborn-common_2.12/pom.xml
 ./META-INF/maven/org.apache.commons/commons-lang3/pom.properties
 ./META-INF/maven/org.apache.commons/commons-lang3/pom.xml
 ./META-INF/maven/org.jctools/jctools-core/pom.properties
@@ -86,7 +74,7 @@
 ./META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so
 ./META-INF/native/liborg_apache_celeborn_shaded_netty_transport_native_epoll_x86_64.so
 ./META-INF/services/reactor.blockhound.integration.BlockHoundIntegration
-./celeborn-client-spark-3-shaded_2.12-0.4.0-SNAPSHOT.jar
+./celeborn-client-spark-3-shaded-assembly-0.4.0-SNAPSHOT.jar
 ./org/apache/celeborn/client/ApplicationHeartbeater$$anon$1.class
 ./org/apache/celeborn/client/ApplicationHeartbeater.class
 ./org/apache/celeborn/client/ApplyNewLocationCallContext$.class

@pan3793
Copy link
Member

pan3793 commented Jun 27, 2023

The SBT is extremely faster than Maven, and its interactive CLI has a better experience for running tests.

Personally, I lean to switch the building tool from Maven to SBT, as Spark does SPARK-44173, but this needs a consensus of the community :)

@FMX
Copy link
Contributor

FMX commented Jun 27, 2023

+1.

Glad to see this patch, maven's building is slow.

@waitinfuture
Copy link
Contributor

Thanks for this PR 😄

Copy link
Contributor

@AngersZhuuuu AngersZhuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update this new method in doc?

@pan3793
Copy link
Member

pan3793 commented Jun 27, 2023

There is an additional benefit if we decide to migrate to SBT.

scalatest/scalatest-maven-plugin#99

@pan3793
Copy link
Member

pan3793 commented Jun 27, 2023

Collected concerns from the PPMC:

  • someone has a bad experience about sbt network, we should provide clear docs to setup the mirror to speedup the bootstrap and dependencies download
  • as Celeborn supports different Spark/Flink versions, the equivalent functionality like Maven profiles should be provided via sbt too, with IDEA support.

@waitinfuture
Copy link
Contributor

Thanks @pan3793 , and another concern from @RexXiong is the learning curve of sbt

@pan3793
Copy link
Member

pan3793 commented Jun 27, 2023

another concern from @RexXiong is the learning curve of sbt

I definitely understand it, until I found https://spark.apache.org/developer-tools.html and try it on developing Spark

@cfmcgrady
Copy link
Contributor Author

someone has a bad experience about sbt network, we should provide clear docs to setup the mirror to speedup the bootstrap and dependencies download

  1. for bootstrap

The bootstrap script currently supports passing environment variables DEFAULT_ARTIFACT_REPOSITORY to specify an artifact repository, which enables expedited download of the launcher JAR file.

> rm build/sbt-launch-1.9.0.jar
> export DEFAULT_ARTIFACT_REPOSITORY=https://mirrors.huaweicloud.com/repository/maven/ && ./build/sbt
  1. for dependencies

The Delta project provided a good reference. By default, the sbt will read repository URLs from the ~/.sbt/repositories host path.

> cat ~/.sbt/repositories
[repositories]
  local
  local-preloaded-ivy: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/}, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext]
  local-preloaded: file:///${sbt.preloaded-${sbt.global.base-${user.home}/.sbt}/preloaded/}
  huawei-central: https://mirrors.huaweicloud.com/repository/maven/
  aliyun-maven: https://maven.aliyun.com/nexus/content/groups/public
  gcs-maven-central-mirror: https://maven-central.storage-download.googleapis.com/repos/central/data/
  typesafe-ivy-releases: https://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sbt-ivy-snapshots: https://repo.scala-sbt.org/scalasbt/ivy-snapshots/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  sbt-plugin-releases: https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
  bintray-spark-packages: https://dl.bintray.com/spark-packages/maven/
  typesafe-releases: https://repo.typesafe.com/typesafe/releases/
sbt:celeborn-parent> project spark-3-shaded
[info] set current project to celeborn-client-spark-3-shaded (in build file:/Users/fchen/Project/bigdata/celeborn/)
sbt:celeborn-client-spark-3-shaded> assembly
[warn] spark-core_2.12-3.3.1.jar no longer exists at /Users/fchen/Library/Caches/Coursier/v1/https/mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-core_2.12/3.3.1/spark-core_2.12-3.3.1.jar
[info] Updating
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-core_2.12/3.3.1/spark-core_2.12-3.3.1.pom
  100.0% [##########] 36.9 KiB (70.4 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-parent_2.12/3.3.1/spark-parent_2.12-3.3.1.pom
  100.0% [##########] 137.4 KiB (832.5 KiB / s)
[info] Resolved  dependencies
[info] Updating
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-sql_2.12/3.3.1/spark-sql_2.12-3.3.1.pom
  100.0% [##########] 18.0 KiB (252.9 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-launcher_2.12/3.3.1/spark-launcher_2.12-3.3.1.pom
  100.0% [##########] 8.5 KiB (128.1 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-network-shuffle_2.12/3.3.1/spark-network-shuffle_2.12-3.3.1.pom
  100.0% [##########] 8.5 KiB (87.6 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-tags_2.12/3.3.1/spark-tags_2.12-3.3.1.pom
  100.0% [##########] 5.7 KiB (34.1 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-catalyst_2.12/3.3.1/spark-catalyst_2.12-3.3.1.pom
  100.0% [##########] 11.3 KiB (59.9 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-kvstore_2.12/3.3.1/spark-kvstore_2.12-3.3.1.pom
  100.0% [##########] 8.3 KiB (44.1 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-sketch_2.12/3.3.1/spark-sketch_2.12-3.3.1.pom
  100.0% [##########] 6.2 KiB (29.8 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-unsafe_2.12/3.3.1/spark-unsafe_2.12-3.3.1.pom
  100.0% [##########] 8.1 KiB (38.5 KiB / s)
https://mirrors.huaweicloud.com/repository/maven/org/apache/spark/spark-network-common_2.12/3.3.1/spark-network-common_2.12-3.3.1.pom
  100.0% [##########] 12.4 KiB (163.1 KiB / s)

@cfmcgrady
Copy link
Contributor Author

as Celeborn supports different Spark/Flink versions, the equivalent functionality like Maven profiles should be provided via sbt too, with IDEA support.

Passing Maven profiles to sbt is currently supported. However, integration with IntelliJ IDEA has not yet been tested.

@pan3793
Copy link
Member

pan3793 commented Jun 28, 2023

as Celeborn supports different Spark/Flink versions, the equivalent functionality like Maven profiles should be provided via sbt too, with IDEA support.

Passing Maven profiles to sbt is currently supported. However, integration with IntelliJ IDEA has not yet been tested.

This can be achieved by

image

A further thought, we may not be limited to using Maven style, the original goal is to easily switch different properties groups, maybe we can

  1. let sbt support parse and apply sbt -P<profile1>,<profile2> -P<profile3> ...
  2. define profile specific properties in
    sbt-<profile1>.properteis
    sbt-<profile2>.properteis
    sbt-<profile3>.properteis
    
  3. define sbt-active-properties.txt(ignored by git)
  4. the priority of profile is
    4.1. sbt -P
    4.2. sbt-active-properties.txt

@pan3793
Copy link
Member

pan3793 commented Jul 17, 2023

@cfmcgrady Do you have time to start a pure SBT building system?

@pan3793
Copy link
Member

pan3793 commented Jul 17, 2023

BTW, I think we can introduce a dependency audit mechanism like Spark does, we may need to merge master-jars and worker-jars before doing that.

@cfmcgrady
Copy link
Contributor Author

@cfmcgrady Do you have time to start a pure SBT building system?

  1. I have recently been evaluating the feasibility of running CI via sbt based on this PR, and have successfully validated it.
  2. Since developers in the community lack experience with using sbt for builds, I am contemplating the possibility and complexity of transitioning from a pom-based project to a pure sbt project. do you have any suggestions? cc @pan3793 @waitinfuture

@pan3793
Copy link
Member

pan3793 commented Jul 18, 2023

I think Spark's developer-tools page is a good example to follow, we can provide a similar page at https://celeborn.apache.org/community/contributor_guide/build_and_test/

@pan3793
Copy link
Member

pan3793 commented Jul 18, 2023

I am contemplating the possibility and complexity of transitioning from a pom-based project to a pure sbt project.

Based on my experience with Spark, dual building systems always introduce inconsistency, even after the continuous efforts of many excellent engineers, Spark's output artifacts and dependency resolve list still have some differences between sbt and maven. Thus I would argue that a pure sbt makes life easy.

pan3793 pushed a commit that referenced this pull request Jul 28, 2023
### What changes were proposed in this pull request?

This PR introduces the SBT build system implementation that operates independently from the current Maven build system. Different from #1627, the current implementation does not depend on `pom.xml`

The implementation enables packaging and testing functionalities for server-related modules and Spark-related modules using SBT.

For Flink-related build/test, sbt build documentation, continuous integration, and plugins, they will be submitted in separate PRs

### Why are the changes needed?

improve project build speed

packing the project.

```shell
$ ./build/sbt
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:36:12
sbt:celeborn> package
[success] Total time: 28 s, completed 2023-7-25 16:36:46
```

packing and shading the spark 3.3 client

```shell
$ ./build/sbt -Pspark-3.3
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:39:11
sbt:celeborn> project celeborn-client-spark-3-shaded
sbt:celeborn-client-spark-3-shaded> assembly
[success] Total time: 37 s, completed 2023-7-25 16:40:03
```

packing and shading the spark 2.4 client

```shell
$ ./build/sbt -Pspark-2.4
sbt:celeborn> clean
[success] Total time: 1 s, completed 2023-7-25 16:41:06
sbt:celeborn> project celeborn-client-spark-2-shaded
sbt:celeborn-client-spark-2-shaded> assembly
[success] Total time: 36 s, completed 2023-7-25 16:41:53
```

running server-related tests

```shell
$ ./build/sbt clean test
[success] Total time: 350 s (05:50), completed 2023-7-25 16:48:58
```

### Does this PR introduce _any_ user-facing change?

yes

### How was this patch tested?

tested locally

Closes #1757 from cfmcgrady/pure-sbt.

Authored-by: Fu Chen <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
@cfmcgrady cfmcgrady closed this Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants